DS[TA]-1-a
Ian Goodfellow, Yoshua Bengio and Aaron Courville:
Deep Learning MIT Press, 2016.
Jure Lescovec, Anand Rajaraman, Jeffrey D. Ullmann Mining of Massive datasets MIT Press, 2016.
The material covered here is presented in the HTML/PDF excerpts available for download here.
If, given a matrix \(A\) we find a real \(\lambda\) and a vector e s.t.
\[A\mathbf{e} = \lambda \mathbf{e}\]
then \(\lambda\) and e will be an eigenpair of A.
In principle, if A has rank n there should be n such pairs.
In practice, eigenpairs
are always costly to find.
they might have \(\lambda=0\): no information, or
\(\lambda\) might not be a real number: no interpretation.
A square matrix A is called positive semidefinite when for any x we have
\[\mathbf{x}^T A \mathbf{x} \ge 0\]
In such case its eigenvalues are non-negative: \(\lambda_i\ge 0\).
In Geometry, applying a matrix to a vector, \(A\mathbf{x}\), creates all sorts of alteration to the space, e.g,
rotation
deformation
Eigenvectors, i.e., solutions to \(A\mathbf{e} = \lambda \mathbf{e}\)
describe the direction along which matrix A operates an expansion
deforms a vector by increading the first dimension by a quantity proportional to the value of the second dimension:
\[ \begin{bmatrix} x\\ y \end{bmatrix} \longrightarrow \begin{bmatrix} x + \frac{3}{11}y\\ y \end{bmatrix} \]
The blue line is unchanged:
an \([x, 0]^T\) eigenvector
corresponding to \(\lambda=1\)
Under certains conditions, which are normally met by user-activity matrices, the eigenpairs exists, are real and non-negative and also are orthogonal with each other:
e-vectors describe alternative directions, interpretable as topics
e-values expand one’s affiliation to a specific topic.
Simple all-pairs extraction via Numpy/LA:
def find_eigenpairs(mat):
"""
Testing the quality of eigenvector/eigenvalue computation by numpy
"""
n = len(mat)
m = len(mat[0])
eig_vals, eig_vects = la.eig(mat)
# they come in ascending order, take the last one on the right
dominant_eig = abs(eig_vals[-1])
Caveat: e-values come normalized: \(\sqrt{\lambda_1^2 + \dots \lambda_n^2} = 1\); hence we later multiply them by \(\frac{1}{\sqrt{n}}\)
\(||\mathbf{x}|| = \sqrt{\mathbf{x}^T\mathbf{x}} = \sqrt{\sum_{i=1}^m x_i^2}\)
Pythagoras’ theorem, essentially.
Generalisation:
\(||\mathbf{x}||_p = (|x_1|^p + |x_1|^p + \dots |x_m|^p)^\frac{1}{p} = (\sum_{i=1}^m |x_i|^p)^\frac{1}{p}\)
The Frobenius norm \(||\cdot ||_F\) extends \(||\cdot ||_2\) to matrices
The unit or normalized vector of \(\mathbf{x}\)
\[ \mathbf{u} = \frac{\mathbf{x}}{||\mathbf{x}||} = (\frac{1}{||\mathbf{x}||})\mathbf{x} \]
has the same direction of the original
its norm is constructed to be 1.
\[ M\mathbf{e} = \lambda \mathbf{e} \]
Handbook solution: solve the equivalent
\[ (M - \lambda \mathbf{I})\mathbf{e} = \mathbf{0} \]
A non-zero e is associated to a solution of
\[ |M - \lambda \mathbf{I}| = 0 \]
In Numerical Analysis \(\Theta(n^\alpha)\) methods are available.
find the \(\lambda\)s that make \(|\dots | = 0\), then for each find the associated e.
At Google scale, \(\Theta(n^\alpha)\) methods are not viable, to say the least. Ideas:
find the e-vectors first, with an iterated method.
interleave iteration with control on the expansion in value
\(\mathbf{x_0} = [1, 1, \dots 1]^T\)
\(\mathbf{x_{k+1}} = \frac{M\mathbf{x}_k}{||M\mathbf{x}_k||}\)
until an approximate fix point: \(x_{l+1} \approx x_{l}\).
Now, eliminate the contribution of the first eigenpair:
\[ M^* = M - \lambda_1^\prime \mathbf{x}_1 \mathbf{x}_1^T \]
(since \(\mathbf{x}_1\) is a column vector, \(\mathbf{x}_1^T \mathbf{x}_1\) will be a scalar: its norm. Vice versa, \(\mathbf{x}_1 \mathbf{x}_1^T\) will be a matrix)
Now, we repeat the iteration on \(M^*\) to find the second eigenpair.
Times are in \(\Theta(dn^2)\).
For better scalability, we will cover the Pagerank algorithm later.