Articles by Results

SPC - The Basic Statistics Behind SPC

Posted by Graham Cripps on Thu, Jul 09, 2015 @ 03:04 PM

SPC - The Basic Statistics Behind SPC

This is the second article in our SPC Blog series, aiming to provide a background to the statistics behind Statistical Process Control (SPC) for variable data.

Variable data is derived from anything that can be measured and includes length, diameter, hardness, distance, volume, mass and gloss levels to name but a few.

Basic Statistics are used to convert large amounts of data into a more meaningful form. For me, this is about making pictures from numbers.

SPC requires information or data to be described using three terms:-

  • Location - where the data is located on a line of continuum
  • Spread - the smallest to largest measurement taken
  • Distribution - the way the data is located relative to a central data value

Location - Measures Of Central Tendency

Central tendency describes the location of a set of data. The three descriptors are mean, mode and median.

  • MEAN is the arithmetic mean or average
  • MODE describes the most frequently occurring value
  • MEDIAN is the value of the middle value when all the data is arranged in ascending order

The following two slides illustrate the calculations for all three of these measures (Mean, Mode and Median)

SPC_B2_P1       SPC_B2_P2

 

Where these measures are useful depends on the data set you are using. For example we will use the mean value and sub-group size of 5 samples. Having sub-group samples means we can take advantage of the central limit theorem to be abe to manage data analysis from normalised values.

This provides the advantage of reviewing the data as a normally distributed set of data (more about this in the next article)

Spread

This is simply the difference between the largest and smallest data points or measurements.

Distribution - Measure Of Dispersion

Measures of dispersion define the spread of data and the overall shape of the data. 

If we consider a simple histogram then we can see that there is more than one measure needed to describe the data set.

 

SPC_B2_P3This diagram shows a typical histogram for a linear set of data. This type of graph is very useful for visualising small or large sets of data points, in terms of the distribution, and can be produced using Microsoft Excel.

However, for us to analyse this data further, we would need to overlay a distribution curve for this data.

 

 

 

 

SPC_B2_P4

 

This diagram illustrates three sets of data, all centred on the same value, but the spread and shape of the data sets vary (the spread is the difference between the highest data value and the lowest data value)

You will also notice that, although the last two data sets share the same location and the same spread, the shape is different.

 

 

 

So we use three descriptors to describe the data:

  • location
  • spread
  • shape of the data

Summary

Location is defined by the mean, mode or median value (the diagram above shows the mean for a normal distribution)

Spread is defined by the range (R) value for the data (difference between the highest and latest data points in absolute values).

Shape of the data is defined by the variance (the average of the squared differences from the mean) and is commonly referred to as the sigma (σ) value.

Download your 6σ Conversion Chart 

Topics: Continuous Improvement, Statistical Process Control, SPC

Subscribe via E-mail

Latest Posts

Posts by category

Follow Us!