What is the population stability index (PSI)?
PSI is a measure of how much a population has shifted over time or between two different samples of a population in a single number. It does this by bucketing the two distributions and comparing the percents of items in each of the buckets, resulting in a single number you can use to understand how different the populations are. The common interpretations of the PSI result are:
- PSI < 0.1: no significant population change
- PSI < 0.2: moderate population change
- PSI >= 0.2: significant population change
How is PSI used?
There are two different ways PSI can be used to make good decisions in a machine learning model building context:
Reactive Re-training Triggers
After a deploying a ML model into production, it will continue to provide estimates on the population it was trained on. As the population shifts over time, the estimates become less accurate and relevant to the current population, and monitoring the PSI score from the time of model training to current time can be used as automatic triggers to re-train the model when PSI passes a certain threshold (0.2 for example).
Proactive Feature Selection
When choosing features to go into a model, certain features may have a lot of predictive power at the time of training, but if a feature is prone to rapid changes in distribution, it may not be a wise decision to include it in the model or it may prompt more frequent monitoring once deployed. PSI is an easy way to check the volatility of population changes for features by comparing populations for several previous time periods.
Here’s a quick example of walking through the steps of a PSI calculation for two (mostly) normal distributions.
As you can see above, the slightly left-skewed initial population (blue) has flattened out a bit to have more of a flat top of the bell curve in the new population (green). From a visual inspection, it looks as if the population is shifting, but I would like a quantitative way to measure how much the shift is rather than qualitatively guessing how much I should be concerned. PSI is a great way to come up with a single metric to measure this.
To calculate the PSI we first divide the initial population range into 10 buckets (an arbitrary number I chose), and count the number of values in each of those buckets for the initial and new populations, and then divide those by the total values in each population to get the percents in each bucket. As expected, plotting the percents ends up looking like a discretized version of the original chart:
From here, we perform the actual PSI calculation for each bucket, and them sum them all up to get the overall PSI values for the distributions.
|Breakpoint Value||Bucket||Initial Count||New Count||Initial Percent||New Percent||PSI|
We get a final PSI value of 0.153, which indicates that there’s a chance our population is shifting, and we may want to monitor it going forwards. Of course, this is just one way of calculating PSI by using equal size binning of 10 buckets. If we keep the 10 buckets but change our binning strategy to quantile bins, we end up with a different percent distribution and an overall lower estimate of 0.129.
PSI seems to be a metric primarily used in the financial industry, but I think it can have a lot of useful applications in the wider ML community when used widely and consistently.
Check out my python implementation of PSI and a corresponding python notebook going through the example above at this repo.
Some assumptions the code makes:
- Numpy is the only non-standard library dependency
- Assumes continuous variables (categorical variables are handled differently in PSI)
- Replaces bins in the new population that have 0 count with 0.001 percent to avoid divide by zero errors without affecting overall calculation too much