6.6. Random Projection

    The dimensions and distribution of random projections matrices arecontrolled so as to preserve the pairwise distances between any twosamples of the dataset. Thus random projection is a suitable approximationtechnique for distance based method.

    References:

    • Sanjoy Dasgupta. 2000.Experiments with random projection.In Proceedings of the Sixteenth conference on Uncertainty in artificialintelligence (UAI’00), Craig Boutilier and Moisés Goldszmidt (Eds.). MorganKaufmann Publishers Inc., San Francisco, CA, USA, 143-151.

    • Ella Bingham and Heikki Mannila. 2001.In Proceedings of the seventh ACM SIGKDD international conference onKnowledge discovery and data mining (KDD ‘01). ACM, New York, NY, USA,245-250.

    The main theoretical result behind the efficiency of random projection is theJohnson-Lindenstrauss lemma (quoting Wikipedia):

    Knowing only the number of samples, the estimatesconservatively the minimal size of the random subspace to guarantee abounded distortion introduced by the random projection:

    >>>

    Example:

    References:

    • Sanjoy Dasgupta and Anupam Gupta, 1999.

    The reduces thedimensionality by projecting the original input space on a randomly generatedmatrix where components are drawn from the following distribution

    6.6. Random Projection - 图3.

    Here a small excerpt which illustrates how to use the Gaussian randomprojection transformer:

    >>>

    The reduces thedimensionality by projecting the original input space using a sparserandom matrix.

    If we define , the elements of the random matrixare drawn from

    where

    6.6. Random Projection - 图5 is the size of the projected subspace.By default the density of non zero elements is set to the minimum density asrecommended by Ping Li et al.:.

    Here a small excerpt which illustrates how to use the sparse randomprojection transformer:

    >>>

    References: