Statistical process control (spc) is a robust framework for separating signal from noise. It is worth learning because it is easy, and it gives you superhuman ability to put your effort in where it’s valuable. It has made me, personally, more productive than I could have dreamt of being before it. It helps me be a better manager for my team, by not sending them on unnecessary wild goose chases, and pointing out areas for genuine improvement. (View Highlight)
A part of statistical process control is also a particular outlook on life, a resigning to the fact that all real-world events are governed to a large extent by noise, randomness, and luck, and our ability to influence individual outcomes is limited. (View Highlight)
A very common mistake people make is comparing a measurement to the last measurement of the same thing. (View Highlight)
Another common mistake is to compare measurements against an average of measurements of a similar kind. (View Highlight)
Again, you and I know better. A statistic known as “average” is intentionally designed to fall in the middle of the range2. Roughly half of your measurements will be above average, and the other half below it. Again, this report contains no information at all. (View Highlight)
Specification limits, in cases like these, are wishful thinking. You cannot make something better just by wishing for it. Again, we are interpreting noise; we classify the noise as bad because it exceeded a number someone pulled out of their ass (View Highlight)
But there’s a second kind of specification limit! The typical example is when you are machining a component that should fit into an assembly of some sort. You need the component to be of the right dimensions so it actually fits. This is a valid usage of specification limits, although it also has some problems, so I would avoid it.33 Specifically, just because something is in spec doesn’t mean it’s good. The specification limits are usually fairly wide, and components that are just inside them can easily cause maintenance problems down the line anyway, even if they technically fit at the time of assembly. And two components at opposite sides of their specification limits might not fit well together even at assembly, despite both being in spec individually! Also, by being binary true-or-false tests, specification limits don’t help you improve your processes. You’re either fine or you’re in trouble, there’s no middle ground. (View Highlight)
Then there’s a third kind of specification limit. These can more reasonably be called natural laws. For example, if you arrive late to the train you will miss your meeting. There’s a definitive threshold at which you must be at the train station, or the train will leave without you. This is the fully acceptable use of a specification limit. If you break this type of specification limit, there will be a bad consequence that is out of your control.44 Note that if you have a contract with a customer that specifies that unless you solve their bugs in three days, they can invoke a penalty clause that forces you to pay them millions of dollars, then the example we opened with (bugs must be solved in less than three days) becomes a natural law, rather than an arbitrary specification limit. (View Highlight)
You might recognise other names for the wishful thinking type of specification limit. Budgets, goals, and targets fall into the same category. If someone says, “We are going to increase revenue by 10 % this year!”, the obvious question is, “What happens if we don’t?” Usually nothing. It’s just wishful thinking. Similarly, you might go, “I will not spend more than $650 this year on car repairs.”, …but what if you have to? Wishful thinking. (View Highlight)
Wishful thinking never improves anything, though it has potential to make things worse. (View Highlight)
This noise is called common cause variation in statistical process control. It’s variation contributed by causes common to all data points. In other words, if all data points are affected by the same noise, there’s no statistical way to tell them apart. If measurements are determined solely by noise – by common cause variation – we can say that all measurements are the same, in the sense that two dice are the same, even if one happened to land on a 2 and the on a 4. (View Highlight)
You cannot judge the process by a single outcome. Only a long run of outcomes tell you anything about the process. Consequently, once you know what the process is like, any single outcome from that process adds nothing to your knowledge about the process. (View Highlight)
Now we know what common cause variation is. We measure it with process behaviour charts, also known as control charts. (View Highlight)
There are many types of process behaviour charts6, but the one you can almost always rely on to work is the individuals chart, or XmR chart. (View Highlight)
Here’s how you make it for a time series, like Alice’s weekly call numbers. Start by plotting the values in a run chart, as before.
Now, compute the absolute differences between consecutive data points (View Highlight)
The upper row contains Alice’s call numbers. The lower row has the differences between adjacent numbers77 These differences are sometimes called moving ranges – hence the name XmR chart. It is based on the X values themselves and their moving ranges.. The difference between the first two numbers (86 and 96) is 10, the difference between the second two (96 and 65) is 31, and so on. (View Highlight)
Here comes the magic. Compute the lower and upper natural process behaviour limits (also known as lower and upper control limits) as (View Highlight)
These natural process limits indicate the range of sales calls numbers we can expect from Alice, assuming she doesn’t change anything fundamental about how she works. The process limits incidate the amount of week-to-week variation in her work process. They are a measure of the amount of common cause variation, the amount of noise. (View Highlight)
In other words, you should not be surprised if Alice makes 35 calls one week, because that’s within the process limits. It’s just like rolling snake eyes when throwing dice. Rare, sure, but it does happen. It’s fully within the range of outcomes you should expect of the process. Similarly, some weeks she might get lucky and make 128 calls – nothing extraordinary has changed that week; Alice just got lucky within her regular process. (View Highlight)
Alice’s call numbers are an easy case to deal with: they make up a stable process. We can predict her future performance based on the past. The name stable process is a bit confusing because a stable process is one where the outcome is completely random. A stable process is one where we cannot predict any individual value, but we know the range into which almost all values will fall. (View Highlight)
In traditional spc litterature, stable processes are known as in (statistical) control or controlled. What is meant by this is that we have already controlled for all the significant external factors, and there is nothing left we can control to determine the outcome. We just have to let the process run on its own and produce what it is tuned for. (View Highlight)
When you have a stable process such as this, you don’t have to re-compute the process limits each week. One of the defining features of a stable process is that any given week, statistically, looks like any other week. Because of this, you can just extend the process limits you have already computed indefinitely into the future. (View Highlight)
. The process limits is the system trying to communicate to you what it is actually capable of. You have to accept these limits because anything else is delusion. You can either listen to the voice of the process and get wiser, or ignore it and look like a fool (View Highlight)
Importantly, the system couldn’t care less about what you wish it was capable of. The voice of the process will only ever tell you what it is capable of, no more, no less. This is a deeper point than I have time to expand on here. When in doubt, find out what happens in practise and listen to it. Don’t get blinded by wishful thinking. (View Highlight)
Let’s say you’re Alice’s manager and you’re unhappy with these limits. Instead of rejecting them, you accpt them. But you also want to improve on Alice’s performance. What do you do?
There are two ways we can improve a stable system:
• improve the mean, or
• reduce the variation.
Your instinct will probably be to improve the mean. After all, that’s the most direct indicator of overall performance level. (View Highlight)
I would urge you to focus on reducing variation first. There are two reasons for this that stand out. First of all, variation has a direct cost in and of itself that people underestimate99 It makes planning harder, obviously. If other people depend on your work (in Alice’s case, that might be a fulfillment department), then bursty progress makes it harder for them to keep up too. The further down the chain we go, the worse the variation gets. In the end, delivery might be late for some customers because the shipping department has capacity planned for roughly even load, and what they got was very bursty.. Second, when variation is high, it is difficult to recognise improvements to the mean. With tight variation, it is easier to see even small shifts in performance. (View Highlight)
Any improvement to a stable system requires following statistical trends, asking experts for advice, trying things out, and verifying underlying shifts in process behaviour. (View Highlight)
To be abundantly clear: you cannot improve a stable system by tampering with individual outcomes. The only thing tampering accomplishes is destabilising the system. A stable system must be improved as a whole. (View Highlight)
Improvements to a stable system must be driven by theory. You cannot just react to events – you must build a thorough understanding of what goes into the system and adjust things on as fundamental a level as possible. You need to approach your process as a scientist would a foreign life form: with curiosity and a burning desire to learn everything about it. (View Highlight)
As you can guess, improving a stable system is difficult. Mots people don’t bother. But there are massive rewards to be had here, even though it takes a little intellectual work. (View Highlight)
Let’s say Alice’s manager takes our advice and thinks long and hard about how to improve the sales process. Maybe they find out that the salespeople aren’t really learning from their own experiences – they get stuck in their ruts. So the manager institutes a knowledge sharing programme, where the salespeople get together once a week and share what they’ve learned with each other, pass on tips about prospects, and so on. Alice’s numbers might instead turn out to be (View Highlight)
So far, we have only looked at stable systems. What characterises stable systems is that their outcome is determined by noise, or common cause variation. This means we can’t know exactly how a particular measurement will work out, but we can be reasonably certain that it will fall within a predictable range. (View Highlight)
This has illustrated one of the signal detection tests you can do with process behaviour charts: if any measurement is outside the natural process limits, it’s likely to belong to a category known as assignable causes of variation, i.e. there is a specific thing you can point at and say that “this made it happen”. (View Highlight)
Now we know how to
• recognise stable systems (variation is within natural process limits);
• predict future outcomes of stable systems (individual outcomes impossible to predict, but we can be reasonably confident of a range of possible outcomes);
• improve stable systems (it’s complicated and takes understanding); and
• recognise unstable systems (variation exceeds natural process limits, or has excessive runs that are higher or lower than the average).
Can we predict future outcomes of unstable systems?
No. (View Highlight)
In an unstable system, anything can happen. Predictions are meaningless at best, and outright dangerous in some cases. There are no statistical regularities you can count on. Thus, the first order of business when you have an unstable system is to stabilise it. This is the only improvement you can make with an unstable system. (View Highlight)
Once you’ve removed or controlled for the assignable causes of variation, you have a stable process. Now you can extrapolate performance into the future, or start working on systematic improvement of variation and baseline. But if you have variation from assignable causes, or baseline shifts, the first thing to do is get rid of those so you get a reproducible process. Only then can you start talking about systematic change in a sensible way. (View Highlight)
There is also large set of formal signal detection tests available for process behaviour charts. If you look it up, you will likely come across references to the Western Electric rules15. (View Highlight)
The absolute most common mistake people make is acting on noise as if it was a signal. Trying to fix a stable process by tampering with individual outcomes does not improve things, and long-term likely makes them worse. It destabilises the process, increases variation, and makes the outcome less predictable, not more. (View Highlight)
The opposite mistake also sometimes happen: some people pretend their system is stable, even though they have obvious, unaccounted-for variation from assignable causes. If there is assignable cause variation, you don’t have a process to improve – you have multiple interleaved processes you need to disentangle first and handle separately. (View Highlight)
When people learn about process behaviour charts, one of their instincts is often to transform the data before plotting it. This is a mistake. The process behaviour charts I’ve explained here are robust for most kinds of data you will find in real life1616 There are exceptions! Especially in the software industry, because software processes are notoriously unstable (multi-modal) to the point where process behaviour charts become difficult to interpret.. Transforming the data before plotting it is likely to hide signals. (View Highlight)
Similarly, you should avoid thinking of your data as coming from a theoretical probability distribution. If you start with that asumption, you’re likely to miss important signals. After all, signals (in the spc sense) is your process trying to tell you that it’s really multiple processes interleaved. If you’re assuming a single theoretical probability distribution, you’re assuming away the very thing you’re looking for! (View Highlight)
The change in revenue in a company is usually fairly stable from quarter to quarter. There is nothing you can learn from what it is this quarter in particular, and improving the quarterly revenue increase cannot be done by incentivising individual quarters. It takes whole-system improvement. (View Highlight)
The yearly amount of rain in a location (stable, but at very different levels and with very different levels of variability depending on where you are on the planet!) (View Highlight)
The number of first-time criminals every month in a society is usually fairly stable. Effectively, we are tuning society to produce a certain amount of criminals per month. We shouldn’t be surprised over fluctuations in this number. Improving involves deeper understanding and whole-system changes. Punishing individual criminals leads to no long-term improvement. (View Highlight)
Early in the article, we said that a process cannot be judged by individual outcomes, and that once we know what the process is like, individual outcomes teach us nothing. This is of course not strictly true: there’s just a diminishing amount of information we get from further observations. Once we have 20–30 observations, yet another observation from the same process is incredibly unlikely to statistically deviate much from the observations you have already. This depends a little on what type of process we’re looking at, as well as what the time scales are, and whether there is any aggregation involved. (View Highlight)
It might sound confusing that the outcomes of a process in control is determined by chance. As a reminder, in control means that we have extracted all of the powerful predictors from the process. What remains are a myriad of small factors affecting the outcome this way and that way. Since they are numerous and small, they all sort of cancel out and leave only a small residual effect, which is the noise we’re finding. (View Highlight)
The above also leads to something known as the report card effect: if you try to aggregate too many physical processes into one summary metric, that metric will always be a stable process, meaning it loses its power as an indicator of when something goes wrong. You must look into processes in reasonable detail in order to have meaningful metrics. If you summarise too many things into one number, you average out all the useful signals into noise. (View Highlight)
We measure the common-cause variation in a very particular way: we take the average of the consecutive differences between outcomes. In other words, we measure the point-to-point variation quite literally as the difference between consecutive points in our data. Not only is this an intuitive way to quantify it; it is also very robust against patterns in the data, such as cyclic data. (View Highlight)
A common response when learning about the consecutive differences is to suggest another common measure of dispersion: the global standard deviation. The problem with the standard deviation is that it measures how spread out the entire data set is. By looking at the dispersion of the entire data set, we are looking at variation that includes assignable cause variation – we are overestimating the point-to-point variation. (View Highlight)
We are interested in knowing the component of dispersion that comes from point-to-point variation of common causes. We capture this more accurately by looking specifically at differences between consecutive points, rather than the difference between each point and the global mean. (View Highlight)
leads nicely into where the magic constant 2.66 comes from.
While we don’t want to measure dispersion as the standard deviation of the observed values, because the values we’ve observed may come from different interleaved processes, the general idea of the standard deviation is still useful. It’s useful because of Chebyshev’s inequality, which tells you the fraction of samples that it is theoretically possible to find outside of a multiple of the standard deviation – regardless of the underlying distribution.
The way the computation for the standard deviation is constructed, Chebyshev’s inequality guarantees that no more than 11 % of samples fall outside of three standard deviations – in the absolute worst case. The closer the distribution is to normal, the fewer the observations that will fall outside of three standard deviations. When the distribution is normal, this will be just 0.3 %. (View Highlight)
In the long history of spc, most practitioners and statisticians have found three standard deviations to be a good balance between false positives and false negatives, for most kinds of data. (View Highlight)
The problem is that we can’t measure the standard deviation of our observations directly, because that would involve assuming that all observations came from the same process, which is the question we’re trying to answer. We do have the mean of the absolute value of the consecutive differences, but that isn’t the standard deviation. (View Highlight)
However, the mean value of the consecutive differences is roughly 1.128 times larger than the standard deviation! This depends on the process, and as you can see from the following draws from theoretical processes, that’s not always the value it converges to. But it’s the case often enough that we use it anyway. (View Highlight)
In particular, 1.128 overestimates the convergence point for heavy-tailed distributions1818 You can see it begins already at the exponential distribution, which by definition converges at 1.000, and then for subexponential distributions like Lognormal(1,1) it just gets worse.. This means for heavy-tailed distributions, we will set our limits too close to the mean. However, with heavy-tailed distributions, we are also more prone to see values closer to the mean (this is indeed why we end up underestimating the variation), which, very informally, cancels out part of the problems of overestimating the convergence point. (View Highlight)