Why ELO Matters
Originally invented for chess in 1960 by Arpad Elo, the ELO rating system has quietly become the backbone of modern competitive ranking. If you’ve ever played a ranked video game like League of Legends or Counter-Strike, or if you’ve tracked the rapidly shifting landscape of AI models on the LMSYS Chatbot Arena leaderboard, you’ve interacted with ELO.
But despite its ubiquity, very few people understand how it actually computes these numbers. How does it mathematically guarantee that a grandmaster beating a novice barely affects either player’s score? Let’s break it down interactively.
Win Probability
Here’s the core idea: the rating gap between two players determines how likely each one is to win. The ELO system uses a logistic curve to convert a difference in points into a probability.
Notice that at ΔR=0 the probability is exactly 50/50. If you are 200 points higher than your opponent (+200), your win probability is roughly 76%. The curve is asymptotic — no matter how high the rating gap, it never reaches exactly 0% or 100%.
Playing a Match
When a match concludes, ratings are updated based on the actual outcome compared to the expected outcome. If a favorite wins, they gain very few points, because the result was expected. If an underdog wins, they gain a large number of points.
The K-Factor
The maximum amount a rating can change in a single match is controlled by the K-Factor. A low K-Factor means ratings are “sticky” and change slowly over time. A high K-Factor means ratings are highly volatile and react quickly to recent results.
Notice how the K=10 line barely budges, while the K=32 line reacts significantly to the underdog’s unexpected success. This is why new players (or new models on AI leaderboards) often start with a high K-factor to quickly place them near their true skill level, before lowering the K-factor to reduce volatility once they are established.
Self-Correcting League
ELO is self-correcting. If a player’s rating is too high, they will be expected to win constantly. When they inevitably lose, their rating will drop dramatically. Let’s see this in action by simulating a league where the initial ratings are completely scrambled.
Notice what happens in the simulation:
- The dashed grey lines represent each player’s hidden True Skill.
- Even though the starting ELOs are scrambled (e.g., the worst player starts with the highest rating), the system rapidly sorts everyone out.
- A player like Frank starts with 1800 ELO but a true skill of 1000. Because his rating is so high, the system expects him to win almost every game. When his true skill causes him to inevitably lose to weaker opponents, the ELO formula hands him massive point penalties, dragging his rating down to reality.
- Conversely, Alice starts at 1200 but has a true skill of 1800. She earns huge upsets against higher-rated players, skyrocketing her rating.
This is why ELO is so effective over large sample sizes: individual games have randomness, but over dozens of matches, the math forces everyone toward an equilibrium that matches their actual win rate.
What About Team Games?
So far, we’ve only looked at 1v1 matchups like Chess. But modern competitive games like League of Legends, Overwatch, or Counter-Strike are team-based. How does ELO work when there are 10 players in a match?
Most team games use an adaptation of ELO (often Microsoft’s TrueSkill system or a custom Matchmaking Rating / MMR). Instead of a single 1v1 calculation, the system essentially:
- Calculates a composite “Team Rating” (usually an average or weighted average of the players’ individual ratings).
- Uses the team ratings to calculate the expected win probability for Team A vs Team B.
- Updates every player’s individual rating based on the outcome of the match, often applying the same point delta to everyone on the team regardless of individual performance.
The Matchmaking Dilemma
This leads to a tricky matchmaking balancing act. A matchmaking system has two conflicting goals:
- Match speed: Getting players into a game quickly (which means accepting wider rating gaps).
- Match fairness: Ensuring the game is a 50/50 toss-up (which means waiting longer for players with identical ratings).
If a game heavily prioritizes match speed, you might get a high-ELO player matched with low-ELO teammates against an average-ELO team. This creates a deeply frustrating experience for the high-ELO player.
Why? Because of the math. If a highly-rated player is on a team that is statistically favored to win, a loss is incredibly punishing to their rating. If their lower-rated teammates make mistakes that cost the game, the high-ELO player will lose a massive amount of points, while a win would have barely nudged their rating up.
Because the ELO system punishes “upsets” so harshly, team-based games are highly incentivized to keep matchmaking windows extremely tight. A balanced, close-ELO match ensures that if you lose, the rating penalty is fair and proportional, preventing the exact kind of “ELO hell” scenarios that drive players away.
