![]() |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
More than just statisticsJenny Tayler, Department of School Education, Ryde DistrictThe approach taken to the data section of the Chance and Data strand of the incoming 9-10 syllabus requires more than the time-honoured calculation of means, medians, modes, ranges and standard deviations, and some brief reference to their relevance and applications. The intention is that students gain a greater understanding of the power of data handling and be able to give shape and structure to a mass of data in order to extract meaning from it. Greater emphasis is given to students collecting their own data, having first determined specific question(s) they wish to address, and to an awareness of the degree to which the process of collection can influence the data itself. The design and effectiveness of data collection strategies such as questionnaires is an essential part of any statistical investigation carried out in this section of the course. Having collected the data in an appropriate manner, students are encouraged to present it using techniques which best demonstrate the significant features of the data set. As indicated in the 'considerations' preceding the Chance and Data strand, some of these techniques are new to both students and teachers, and will need to be studied by the latter before teaching to the former. New ideas include stem and leaf plots and the shape and spread of distributions, incorporating box and whisker plots. The stem and leaf plot (commonly called the stemplot) A simple example of a stemplot can be derived from the following figures, obtained from the magnitudes of 10 earthquakes, measured on the Richter scale. 8.2 8.3 7.0 7.5 7.9 7.8 8.2 5.6 7.1 6.5 The 'stem' of the plot is formed by the units digits, which are placed vertically to the left of a vertical line. The 'leaves' are formed by placing the tenths digits in ascending order on the right of the line. (See Figure 1.)
Rotation of the stemplot through a quarter turn gives a display which resembles the familiar histogram, but has the advantage of including all the original data points in the diagram. (See Figure 2.)
Figure 2. Comparison of a stemplot with a frequency histogram Features of the stem and leaf plot
Two sets of data arising from two comparable population groups may readily be compared by means of a back-to-back stemplot. This example compares the heights of 30 Year 8 students with those of 30 Year 9 students. (See Figure 3.)
Using this type of display, the shapes of the two distributions can be readily compared, as can central values such as mode and median. Cumulative measures, for example the percentage of students taller than 160 cm, can also be easily compared. Split stem plots It is possible to divide the stem in the case of a distribution covering a very wide range. Consider the following data set which relates to deaths from major explosions from 1910 to 1956 (from the 1980 World Almanac).
These figures range widely, covering tens, hundreds and thousands. A single stemplot, where the stem is measured in hundreds, and rotated to simulate the associated histogram, looks like this:
Figure 4. Single stemplot for a data set This display gives little valuable information about the data set. Also, a degree of detail from the original data (the units figure) has been lost. The option of splitting the data into three, using a separate stem for the tens, hundreds and thousands, produces this display: (See Figure 5.)
Figure 5. A split stemplot From the split stemplot it can be seen that the data are approximately normally distributed in the tens, that measures above 199 are unusual, and that there were two major disasters, accounting for 2700 deaths between them. There is still a loss of detail in the hundreds and thousands. Shape and spread of a distribution The way(s) in which a data set differs from the classic bell curve of the normal distribution can be seen clearly from a visual representation of the data and can give valuable information concerning the phenomenon under consideration. In particular, the concept of a skewed distribution raises interesting questions about the factor(s) influencing the skew. I have always had difficulty with the definition of skewed data, in that it has always seemed to me to be defined back-to-front. However, once you realize that the direction of the skew relates to the tail of the data, and not to the area where the data are concentrated, all becomes clear. A distribution which is negatively skewed (or skewed to the left) has the main body of the data to the right and the tail to the left. In terms of examination performance, for example, a very solid, high-achieving group with a few poorer results would give rise to a negatively-skewed distribution. A weak group with an occasional high performer would produce a positively-skewed distribution. (See Figure 6.)
Figure 6. Comparison of examination performances Apart from the general shape of a data set, information regarding its spread is extremely valuable in forming opinions from the data. The measures of range, variance and standard deviation, with which we are already familiar, are looked at in a different light in the Chance and Data strand. Students are expected to be able to calculate the standard deviation for a small number of discrete data points, but the emphasis is not on calculation but on interpretation. What do these measures tell you about the data? What kind of judgements can you make about the data, based on these measures? There are some questions that students commonly
ask about these measures of spread. One of them relates to the fact
that there are two buttons for standard deviation on the calculator,
or Note, however, that the syllabus does not require
this distinction to be taught, and that the use of the Another commonly asked question concerns variance. Why do we define it if we don't use it? And why don't we use it anyway? Variance as a measure of spread is used extensively by statisticians in a process known as ANOVA (ANalysis Of VAriance), in which the variability of a complex process is analysed in terms of the variance of its component parts. Such a process is beyond the scope of the 9-10 course but the measure is defined because it has many applications in further statistical analysis. Standard deviation is used at this level in preference to variance because its units are the same as those of the original data points, and can be readily used in making judgements about the spread of the distribution. The interquartile range (IQR) and the five number summary The interquartile range describes the location of the central 50 per cent of the data, from the 25th percentile to the 75th. The 25th percentile is defined as the lower quartile (LQ) and the 75th as the upper quartile (UQ). The IQR is the difference between UQ and LQ. Its precise calculation is as follows: (a) Find the rank of the median, (n + 1)/2. (b) Ignore any fraction in the median rank; call this new rank M. (c) Find the rank of the quartiles, (M + 1)/2; call this Q. (d) Count Q data points from each end of the distribution, taking the average if between two points. (e) Find the difference between the two values found in (d). Using the earthquake example referred to above: (a) median rank (10 + 1)/2 = 5.5 (b) M = 5 (c) Q = (5 + 1)/2 = 3 (d) LQ = 7.0 (3rd from bottom) UQ = 8.2 (3rd from top) (e) IQR = 8.2 - 7.0 = 1.2. For a large data set, these measures can be approximated from a cumulative frequency polygon or ogive. Note that for a normal distribution the ratio IQR : sd is approximately 4 : 3. Data sets may also be described by means of a five-number summary. This is written as n(A, B, C, D, E), where: n = number of data points C = median A = lower extreme of data D = upper quartile B = lower quartile E = upper extreme. The data concerning the heights of Year 9 students (see Figure 3) can thus be described by the summary 30(136, 154, 160, 168, 179). The box and whisker plot This summary can be further illustrated by the use of a boxplot (box and whisker plot) which gives a clear and concise view of the spread of the data. (See Figure 7.)
Figure 7. A boxplot The labels are explanatory only, and would not appear in a regular boxplot. The scale is an integral part of the display, indicating the range of values in the data set. The box and whiskers need to be accurately located with respect to the scale. Note that the height of the box is not a significant issue. Robust and non-robust measures in data analysis
Vast quantities of information are presented daily to the public in the form of statistical summaries and graphs from which conclusions are drawn and on the basis of which decisions are made by powerful people. It is essential for our students to comprehend and be critical of the assumptions made in these processes, and to be alert to the simplification and, on occasions, inaccuracies inherent in many of the visual representations shown in the media. The emphasis in the Chance and Data strand is to provide students with this critical edge.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Stage 6 Mathematics Syllabus Review
Maths Teachers’ Day at University of Wollongong
UWS Question-and-Answer Program for teachers in Western Sydney
2008 Premier's Teacher Scholarships
Postgraduate Mathematics Education Units
Clarification about abbreviations and Geometrical Reasons
Enrichment Maths for Secondary School Students
Stage 1 : Kindergarten, Year 1 and Year 2 Mathematics
Stage 2 : Year 3 and Year 4 Mathematics
Stage 3 : Year 5 and Year 6 Mathematics
Stage 4 : Year 7 and Year 8 Mathematics
Stage 5: Year 9 and 10 Mathematics (Intermediate)
Visit the Primary PD and Secondary PD pages for the latest Inservice news
Use our Calendar to see all events taking place this month.