College soccer

Albyn Jones ratings

M: DI, DII, DIII, NAIA

W: DI, DII, DIII, NAIA

About the ratings

Polls

NSCAA

Soccer Times

Soccer America

Soccer Buzz (W only)

collegesoccernews (M only)

d3kicks

Ratings Percentage Index

Dissecting the RPI

The RPI

What is the RPI?


National teams

FIFA Rankings

Ron Kessler's

Elo Rankings

Albyn Jones


Rankings for other sports


College Football

Official BCS webpage

BCS Explained

BCS/wikipedia


Sagarin ratings

Dolphin ratings

Massey Ratings


Heal Points System

as used by the Maine Principle's Assn.

Iterative HEAL (RHEAL)


College Hockey

KRACH ratings

PairWiseRanking (PWR)

PWR explained


Chess (mostly Elo ratings)

US Chess Federation

Mark Glickman

FIDE (Int'l Chess Fed.)


Tennis

WTA Tour

ATP Rankings


Ranking /wikipedia

Women's soccer

Rankings and Ratings


polls

standings

adjusted results

predictive models

power ratings

arbitrary ratings

meta ratings

pairwise rankings

tournament seedings


RPI

Jones Ratings


Rankings and ratings are estimates of either team performance or strength. A ranking is simply an ordered list of teams. A rating is a number or descriptive term assigned to a specific team. A ranking can be derived from ratings but ratings can be more meaningful. In addition to indicating which team is supposedly better, a rating can also indicate how much better.

Performance vs. strength  -  A performance-based rating or ranking (sometimes called "merit-based" or earned) is determined entirely by past results. Strength-based ratings (including predictive statistical models and power ratings) are based on other factors in addition to past performance, such as injuries, changes in personnel and fitness. Although past performance and strength are often related, they are not necessarily so. For a variety of reasons, a strong team may have done poorly in the past and, conversly, a weak team may have done well. A merit-based system is based on actual accomplisments while a strength based system is based on the potential for future accomplishments.

A dynamic rating system counts recent results more than earlier results. A static system counts early season and late season results the same. Many feel that merit-based or 'earned' systems should be static. Most rating systems intended to be predictive are dynamic.


Polls use the votes of poll participants to arrive at rankings. Accuracy depends on the knowledge of the voters (who are usually either coaches or journalists). In theory, polls should be sensitive to short-term phenomena such as injuries but, in practice, they usually aren't. Polls are based on opinion and tend to be highly volatile. They almost invariably over-react to the most recent result.

The main reason most polls exist is their entertainment value. Polls are used to stimulate interest, either in the webpage or magazine that conducts the poll or in the sport. Some polls are little more than beauty contests that have little to do with qualifications for post-season play.


Standings are calculated directly from results of round-robin play. Standings work well for small groups of teams (leagues, conferences, etc.) where each team plays every other team and all teams play essentially the same schedule. The method works less well when different teams play different schedules. Standing are usually based on either accumulated points or winning percentage.

In soccer, round-robin accumulated points (AP) is calculated by awarding three points for a win, one point for a tie and no points for a loss. In some leagues bonus points are added for goals scored.
Round-robin winning-percentage (WP) is calculated by dividing the number of wins by the number of games played, counting a tie as half a win.


Adjusted results account, in some way or another, for differences in strength of schedule. These systems (sometimes called 'merit-based' or 'earned') usually count early-season results the same as late-season results and are therefore called 'static'. The goal is not to identify the strongest teams but, rather, to identify the teams that have the best records over a given period of time.

Systems intended to be merit-based include:

Soccer:

the Rating Percentage Index (RPI) (used by the NCAA)

Longo ratings (used by the NAIA)

FIFA National Team Rankings

Other sports

Heal points system (Maine high schools)

FIDE ratings (chess)

WTA ratings (women's tennis)

ATP ratings (men's tennis)


Rating Percentage Index (RPI)
(used for various sports)

The RPI is calculated from the winning percentages of a team and their opponents. The NCAA has used the RPI in basketball since 1981 and in soccer since 1997. A team's basic RPI is 25% their winning percentage, 50% their opponent's winning percentage, and 25% their opponent's opponent's winning percentege.


The formula for basic RPI is:

RPI = .25 x WP + .5 x OWP + .25 x OOWP

where

WP = winning percentage, with a tie counting as half a win.

OWP = opponent's winning percentage, not counting games against the team whose RPI is being calculated.

OOWP = opponent's opponent's winning percentage, not counting games against the "opponent".

The NCAA also calculates an adjusted RPI. The exact details are a well-guarded secret, and vary from sport to sport, but they involve awarding bonuses for "good wins" (i.e. wins against teams with high RPIs) and deductions for "bad losses". Deductions are also made for weak schedules (e.g. a team playing 50% of their games against teams with losing records).


RPI Strength of Schedule (SoS) is also calculated. It is two-thirds a team's opponent's winning percentage and 1/3 their opponent's opponent's winning percentage.


The formula for RPI SOS is:

SoS = (2/3) x OWP + (1/3) x OOWP

Criticism of RPI: (under construction)

RPI is supposedly merit-based but is mostly a measure of who a team plays and has no basis in either statistical theory or common sense.


Predictive ratings are based on a statistical model of some kind (such as Bradley-Terry or Elo). Models vary considerably. Some consider home field advantage and/or opponent strength; others do not. Some consider score (usually score difference or score ratio); others do not. Some recalculate opponent strength every time the opponent plays; others do not. Some require a recursive calculation; others do not.

Power ratings are intended to be predictive and are similar to statistical models. The difference is that power ratings consider subjective factors.

Most predictive models and power rating systems count recent results more than earlier results, and are therefore dynamic.


Examples of predictive and/or power ratings include:

Albyn Jones ratings for college soccer

KRACH ratings for college hockey

Sagarin ratings for various sports

Massey ratings for various sports

Elo ratings for world football

Ron Kessler International Soccer ratings


Albyn Jones ratings

The Albyn Jones rating system uses a dynamic logistic model based on the Bradley-Terry method.


The basic formula (ignoring ties) is:


where

PH is the probability of the home team winning

RH and RA are the ratings of the home and away teams, respectively

sf is a scale factor chosen so that, with a difference of 100 points in the ratings, the higher-rated team has a 2/3 probability of winning (2 to 1 odds).

sf = (ln 2)/100 = 0.00693...


An arbitrary or preferential rating or ranking is based on factors that have little to do with performance or strength. For example, the NCAA seeds its national tournaments by committee and considers factors that are arbitrary and preferential.


A meta rating (or ranking) is a combination (sometimes an average) of ratings or rankings derived by different methods. The Bowl Championship Series (BCS) in college football uses a meta ranking.


Pairwise rankings (under construction)


Tournaments seedings / qualification for post-season play

The selection and seeding of post-season tournaments can be done in a wide variety of ways. A seeding is a ranking, and it can be based on almost any of the ranking method described above.

An interesting example of a system created specifically to seed tournaments is the HEAL rating system used in various high school sports in Maine.

Many conferences and small leagues simply use standings, combined with tiebreakers such as head to head record and goal differential are often used.

Seeding of the NAIA tournaments in soccer heavily on adjusted results (Longo ratings) although other factors are considered in the case of teams with similar ratings.

The NCAA uses a complex system. Ultimately, selecting and seeding is done by committtee based, at least in part, on the following criteria (not listed in preferential order):

Primary factors

> Won-loss record.

> Strength of schedule/RPI.

> Head-to-head competition.

> Results against common opponents.

> Results against teams already selected to participate in the championship.

> Results against teams under consideration.

Secondary factors:

> Late-season performance.

> Eligibility or availability of student-athletes.

It is unclear exactly how the various factors are weighed. What is clear is that NCAA selection committees are not bound by any objective criteria or process. Ultimately, selection and seeding is done by committee on a basis known only to the committee.






 
The Bradley-Terry method

The Bradley-Terry method is a fundamental model for paired comparisons of individuals in a group. The method, which is similar to one used in 1929 (by a German named Zermelo), was described by Bradley and Terry in 1952.

In the Bradley-Terry model, each individual is assigned an attribute such that the probability that individual i is preferred to individual j is given by:


where  is an attribute associated with individual i.
top of page


 
Elo ratings
(Used by the US Chess Federation, among others.)

Elo ratings change according to whether a team or player performs better or worse than expected.

A new rating is calculated from an old rating according to the formula:
 

where

  • S is the score (number of wins plus half the number of ties)
  • K is the development coefficient
  • is the sum of win expectancies for each match

The win expectancy is given by:
 

where
  • is the difference in ratings:
  • sf is a scale factor (initially equal to 400)

The preceeding only works if a player already has a rating. A new player's rating is given by:
 

R = Rc + sf (Nwins-Nlosses) / Ntotal
where
  • Rc is the average rating of the competetors (opponents)
  • N represents the wins, losses and total games played.

Elo ratings were developed in 1959 by Arped Elo, a professor of physics at Marquette University and a former president of the American Chess Federation. They are based on the assumption of a bell-shaped curve (normal distribution) of ratings.



A comparison of selected rating systems
rating system type
of
system
game
result
function
calculation other factors
type recursive? dynamic? HFA OS other
RPI merit WP+OS addition no no no yes no
Longo rating merit WP+TGS table no no yes yes no
Albyn Jones rating predictive WL formula yes yes yes yes no
Tennis (ATP, WTA) merit WL addition no no no yes yes

Game result function

WLT (Wins, Losses and Ties) or WL (Wins and Losses)

Pts (Points), usually 3 points for a win, 1 for a tie, 0 for a loss.

+GS (plus extra points for goals scored).

WP (Winning Percentage), with a tie is counted as half a win

GS (Goals Scored) or TGS (Truncated Goals Scored), up to a maximum in a game

SD (Score Difference) or TSD (Truncated Score difference), up to a maximum in a game

SR (Score ratio)

Other factors

OS (Opponent strength)

HFA (home field advantage)




Merit-based rating systems used in sports other than soccer



FIDE Rating system international chess

FIDE uses a relatively simple numerical system with an arbitrary scale. Results (percentage scores) are converted to rating differences by means of a simple table look-up.  Table values are based on Elo-type calculations.

The change in a player's rating due to a tournament result is given by:
 

delta R = k ( W - We )

where

  • W is the score achieved in the tournament
  • We is the expected score determined by a table look-up based on the difference between players' rankings.
  • k is the development coefficient

  • (25 for new players, 15 for a player that has never reached a rating of 2400, 10 for established players)

The development coefficient, k, stabilizes the ratings for established players. When K is 10 it takes about 75 games for a rating to change completely. When k is 25, it only takes about 30 games.



TENNIS


WTA Rankings women's tennis 

WTA Rankings are a 52-week cumulative point system (published weekly), in which ranking points are the sum of round points and quality points. Round points are awarded depending upon how far a player advances in a tournament, as well as the level of the tournament and the prize money. Quality points are awarded for each victory and depend on the ranking of the opponent at the time the game is played. Only a player's 17 best tournaments are counted (for singles).

Some examples of round points are shown in the abbreviated table below.

Round Points (examples)

tournament level
W
F
SF
QF
R16
R32
R64
R128
Grand Slam
650
456
292
162
90
56
32
2
Tier I (32) $2M
325
228
146
81
45
1
-
-
Tier I (16) $1,3M 
300
210
135
75
1
-
-
-
Tier III (32) $225K 
145
103
66
37
19
1
-
-
Note: the actual table includes many more entries

In addition, quality points are awarded for each win in a tournament over a player ranked in the top 500 at the time of the game. The number of quality points awarded depends on the loser's rank. Quality point examples are shown below. Quality points are doubled for grand slam events. 
 

loser's rank
1
2
3
4
5
6-10 11-16 - - - - 251-500
quality points
100 75 66 55 50
43
35
- - - -
1


ATP rankings men's tennis 

ATP rankings are "merit-based", and award points depending on how far a player advances in a tournament and the level of the tournament. The ATP Champions Race ranks players on a calendar-year basis. The ATP Entry Ranking determines seeding in all tournaments. Both rankings are published about 45 times a year.