DS[TA]-3-c Rating and Ranking
February
![]() |
![]() |
Ratings are the solution \(\mathbf{r}\) of \(\overline{M}\mathbf{r} = \mathbf{p}\)
The data that drives ratings is point difference.
Duke, Miami, U of North Carolina, U of Virginia, Virginia Tech (plus Georgia Tech and Pittsburgh now)
is point difference so clearly a reflection of the skills gap?
what if, in a short season, the final result is the only goal?
For each team consider the no. of historical wins, \(w_i\) and losses, \(l_i\); \(t_i = w_i + l_i\)
\[ r_i = \frac{w_i}{t_i} = \frac{w_i}{w_i + l_i} \]
simple
captures long-term trend (but not decay)
ignores strength of the opponents
cold-start problem: \(\frac{0}{0}\rightarrow \frac{0}{1} | \frac{1}{1}\)???
L. initialises rational probabilities to 0.5:
\[ r_i = \frac{1 + w_i}{2 + t_i} = \frac{1 + w_i}{2 + w_i + l_i} \]
\(\frac{1}{2}\rightarrow \frac{1}{3} | \frac{2}{3}\)
\(\frac{1}{2}\rightarrow \frac{1}{3} | \frac{2}{3} \rightarrow \frac{1}{4} | \frac{2}{4} | \frac{3}{4} \rightarrow \frac{1}{5} | \frac{2}{5} | \frac{3}{5} | \frac{4}{5}\)
Ratings of 0 or 1 are out of reach.
Colley ratings can be rephrased to contain opponents’ strength as one of the factors.
\[ w_i = \frac{w_i}{2} + \frac{w_i}{2} \]
\[ w_i = \frac{w_i}{2} + \frac{w_i}{2} + \frac{l_i}{2} - \frac{l_i}{2} \]
\[ w_i = \frac{w_i-l_i}{2} + \frac{w_i + l_i}{2} \]
\[ w_i = \frac{w_i-l_i}{2} + \frac{w_i + l_i}{2} \]
\[ = \frac{w_i-l_i}{2} + \frac{t_i}{2} \]
\[ = \frac{w_i-l_i}{2} + \sum_{j=1}^{t_i}\frac{1}{2} \]
At the start, since each \(r_x\) is set to \(1/2\) we have:
\[ \sum_{j=1}^{t_i}\frac{1}{2} = \sum_{j\in O_i}r_j \]
where \(O_i\) is the set of opponents.
As we progress, this becomes a convenient approximation:
\[ w_i \approx \frac{w_i-l_i}{2} + \sum_{j\in O_i}r_j \]
so we can interpret \(w_i\) as winning balance plus sum of strengths of the opponents.
\[ r_i = \frac{1 + w_i}{2 + t_i} \]
\[ r_i = \frac{1 + \frac{w_i-l_i}{2} + \sum_{j\in O_i}r_i}{2 + t_i} \]
Let’s compute all \(r_i\)’s at once.
\[ C \mathbf{r} = \mathbf{b} \]
where C and b are defined to reflect Colley’s rating formula
\(c_{ii} = 2 + t_i\)
\(c_{ij} = -n_{ij},\) i.e., the number of direct matches
\[ \begin{pmatrix} 6 & -1 & -1 & -1 & -1 \\ -1 & 6 & -1 & -1 & -1 \\ -1 & -1 & 6 & -1 & -1 \\ -1 & -1 & -1 & 6 & -1 \\ -1 & -1 & -1 & -1 & 6 \end{pmatrix} \]
Essentially \(C = 2I + M\)
\[ \begin{pmatrix} 6 & -1 & -1 & -1 & -1 \\ -1 & 6 & -1 & -1 & -1 \\ -1 & -1 & 6 & -1 & -1 \\ -1 & -1 & -1 & 6 & -1 \\ -1 & -1 & -1 & -1 & 6 \end{pmatrix} \mathbf{r} = \mathbf{b} \]
Now set \(b_i = 1 + \frac{1}{2}(w_i - l_i)\)
\[ \begin{pmatrix} 6 & -1 & -1 & -1 & -1 \\ -1 & 6 & -1 & -1 & -1 \\ -1 & -1 & 6 & -1 & -1 \\ -1 & -1 & -1 & 6 & -1 \\ -1 & -1 & -1 & -1 & 6 \end{pmatrix} \mathbf{r} = \begin{pmatrix} -1 \\ 3 \\ 1 \\ 0 \\ 2 \end{pmatrix} \]
Team | rc | Colley | Massey |
Miami | .79 | 1st | = |
VT | .65 | 2nd | = |
UNC | .50 | 3rd | 4th |
UVA | .36 | 4th | 3rd |
Duke | .21 | 5th | = |
Laplace correction: \(\frac{Pos + 1}{Tot + 2}\)
winning-only: in fact includes the strengths of the opponents
the total strength of the league tends to remains constant
latent variables represent non-measurable skills
they live in a feature space, possibly separated from the traditional data space
yet they may get a numeric estimate, and inform our predictions
M. and C. regress on the latent variable strength.
Colley can run with Massey’s points balances (and v. v.)
Both methods can be applied to Collaborative filtering.