An Invitation to Support Vector Machines

DS[TA]-5-b

February

Data Science: T & A and A & T

map datapoints from input space to feature space, to reveal dependencies.

Computation in feature space could be unfeseable, but often the Kernel trick reduces it to computing the Kernel matrix \(K_{ij} = x_i^T \cdot x_j\)

To compute \(K\) we will always need to \(\Theta(n^2)\) dot products, costing \(\Theta(d^2)\) each.

\(\phi(\cdot): \mathcal{D_1}, \dots \mathcal{D_d} \rightarrow \mathcal{F}^e\)

Meaning: let’s compute with \(K\)

Reduces computation to looking up the Kernel matrix.

However, for a given kernel function \(\phi\) we need to show that

the analysis task (min/max/var/mean etc.) can be reduced to a combinations of dot products \(\phi(x_i)^T \cdot \phi(x_j)\) in the feature space.

dot prod. can be replaced with look-ups in the Kernel matrix, e.g., \(\phi(\vec{x_i})^T \cdot \phi(\vec{x_j}) = K(\vec{x_i}, \vec{x_j})\)

The K. trick works for the three model kernels

map datapoints over a feature space where they would appear to fall farther apart from each other
find areas of maximum separations between groups
determine the cutting (hyper)plane that cuts across the clear area (in 2D it’s a straight line)
use it as a classifier

members of a class have individual/family variations that make their values oscillate around a typical class value
such classes are somewhat non-overlapping: there must be an emergent difference in features between individuals of class A and B.

in the vicinity of class A datapoints, values could only be proper for a class A object
datapoints should not be too close to the border to the next class

points lying at the mimimum distance from the cutting hyperplane are called support vectors
each class will have its support vectors and its minimum distance \(\delta^{c}\) from the cutting h..
on to the Zaki-Meira presentation…

As with K-means, we postulate that

class difference should be revealed by a distinctive difference in measured values, i.e., some break of continuity.

hence, two points that are very close should always end up in the same class.

Assortativity […] is a preference for […] attaching to others that are similar in some way.

[…] at one point [he] moved from the Mathematics Department at MIT to that at Boston University.

Which singlehandedly raised the average IQ in both departments.