Chapter 2: Descriptive Statistics

Russian Campaign

Charles Minard's information graph on Napoleon's Russian Campaign (1812-1813) : Several variables shown in two dimensions - Army's location and direction; declining size of the army and the low temps during the retreat.

What does Tchaikovsky have to do with this? (He wrote the Festival Overture "The Year 1812" to commemorate Russia's 1812 defense against Napoleon's advancing Grande Armée during the devastating French invasion of Russia.)

Shape of a Distribution

Test scores in a small MBA class: 70, 72, 76, 80, 84; 84; 88, 90, 90, 94; 96, 98, 100, 100, 100.

Actual GMAT scores of entering DeGroote MBA classes (2003 to 2005). Let's look at the 2003 scores.

IQ Scores of 500 individuals.

Remark: See the solution for Problem 2.8 on page 37 where the class lengths are different. (Annual savings of companies benefiting from ISO 9000 registration)

Symmetry, positively-skewed and negatively-skewed.

Here's the distribution of the salaries of McMaster employees making above $100,000 in 2010. Is the distribution symmetric, positively-skewed or negatively-skewed?

This information is public and the most recent data are available on the Ontario Government web site.

Central Tendecy

A median separating the opposing lanes of traffic.

Median house price : House prices are usually reported by using the mean and the median prices.

Real Estate Data (again) : Look at the "Bedrooms" variable and find the mean, mode and median (and a dot plot)

Tolerance Intervals

Coffee Temperature (again) : The process is in statistical control, but is it "capable" (meets specifications)?

More Measures of Variation

 

Percentiles

 

Percentiles reported in GMAT Score Report. In her last attempt (07-24-2008), this person's total score was at the 83rd percentile.

 

Here's the GMAT score breakdown in terms of percentiles (between January 2007 and December 2009). And here is a graphical version of the breakdowns.

 

The pth percentile of a group of n measurements is a value such that (approximately) p% of measurements fall at or below the value and (approximately) (100-p)% fall at or above the value.

An easy method for locating the position of the pth percentile of n measurements: With this easy method, any percentile value will either be, (i) equal to a particular observation, or be (ii) the average of two neighbouring observations.

For the example discussed in class with n = 8, and p = 25, the position is 2.5. When n = 7 and p = 25, the position is 2.

Box-and-whisker plot

General structure of a box-and-whisker plot.

Note: MegaStat calculates the quartiles, etc., somewhat differently. For small data sets (as in the case of n = 7 discussed in class), the results found by MegaStat will look a bit different from the results found using the textbook method above.

IQ Scores (again) with a box plot and extremes.

Here's again the distribution of the salaries of McMaster employees making above $100,000 in 2010. Do you notice the positive skewness in the bok-whisker plot?