6.6. Random Projection
The dimensions and distribution of random projections matrices arecontrolled so as to preserve the pairwise distances between any twosamples of the dataset. Thus random projection is a suitable approximationtechnique for distance based method.
References:
Sanjoy Dasgupta. 2000.Experiments with random projection.In Proceedings of the Sixteenth conference on Uncertainty in artificialintelligence (UAI’00), Craig Boutilier and Moisés Goldszmidt (Eds.). MorganKaufmann Publishers Inc., San Francisco, CA, USA, 143-151.
Ella Bingham and Heikki Mannila. 2001.In Proceedings of the seventh ACM SIGKDD international conference onKnowledge discovery and data mining (KDD ‘01). ACM, New York, NY, USA,245-250.
The main theoretical result behind the efficiency of random projection is theJohnson-Lindenstrauss lemma (quoting Wikipedia):
Knowing only the number of samples, the estimatesconservatively the minimal size of the random subspace to guarantee abounded distortion introduced by the random projection:
>>>
Example:
References:
- Sanjoy Dasgupta and Anupam Gupta, 1999.
The reduces thedimensionality by projecting the original input space on a randomly generatedmatrix where components are drawn from the following distribution
.
Here a small excerpt which illustrates how to use the Gaussian randomprojection transformer:
>>>
The reduces thedimensionality by projecting the original input space using a sparserandom matrix.
If we define , the elements of the random matrixare drawn from
where
is the size of the projected subspace.By default the density of non zero elements is set to the minimum density asrecommended by Ping Li et al.:.
Here a small excerpt which illustrates how to use the sparse randomprojection transformer:
>>>
References:
D. Achlioptas. 2003.Database-friendly random projections: Johnson-Lindenstrauss with binarycoins.Journal of Computer and System Sciences 66 (2003) 671–687