PCA itself is not actually a machine learning algorithm - it's an algorithm for simplifying an input data set with many dimensions so that you can then feed it to an ML algorithm without blowing up your computer. Or, more simply put, you can use PCA to narrow down the number of input features you have when you suspect that many of them might be highly correlated with each other.
Where it gets interesting is that you can also use PCA for such high-dimensional objects as JPEG images, which may have tens of thousands of pixels. If you are feeding each pixel one-by-one into a machine learning algorithm for purposes of image recognition, computer vision, or whatever it may be, the computation time is going to be huge.
PCA is a mathematical method to find the best description of correlations between variables in order to use fewer input features in place of more input features. For instance, if temperature (T) and snowfall (S) are highly correlated, why not use one input feature Z that represents both variables simultaneously (where Z is the line which represents the least projection error fit to all the points in the S, T plane) to reduce the algorithm's computation time?
So PCA essentially decomposes a complex, high-dimensional data with N dimensions down to a more manageable data set of K dimensions (which is a number of our own choosing).
![]() |
| Source: Pennsylvania State University - Department of Statistics Online Programs |
Without getting into the details of how this is done - it involves linear algebra and some thorny mathematical derivations - here is what I did to test out PCA using images from the Internet:
- I collected 100 JPEG images of President Obama from Google Images. Each image was 100 x 100 pixels, which meant 10,000 pixels per image in total. However, my laptop lacked sufficient computation power to do the calculations necessary for dimensionality reduction via PCA with that many inputs - so I used Paint to condense each image down to 50 x 50 size, or 2,500 pixels each. This is the full collection of Obama images I gathered for my input matrix:
- Next, I used the pixelmap library in R to convert each of the images into a matrix of numbers, i.e. 2,500 pixel intensity values. With 100 total images, my input feature matrix was 100 x 2500 in size.
- After I had my input matrix, I ran it through the PCA algorithm to take the 40 most statistically significant principal components, which retains a full 95% of the variance in input features throughout the 100 sample images. It so happens that the amount of variance retained from the N-dimensional input feature matrix if I compress it down to K dimensions follows a logarithmic type of distribution, as shown below:
- This graph above essentially means the 50 x 50 images can be sufficiently represented by only 100 dimensions instead of 2,500 for purposes of numerically representing the pictures based on pixel intensity variation. In fact, even just 40 dimensions captures 95% of the variance. Think about what this means for a second - if a certain pixel has an intensity X, for example, several of the surrounding pixels are highly correlated in pixel intensity to X. So dimensionality reduction lets us represent the whole image numerically in only 100 clusters instead of 2,500.
- Now, I was interested in how much distortion the approximation by PCA caused for each of the images. So I ran the reduction matrix in reverse to approximate the original pixel values and printed . This would be similar to say, splitting an average value Z into two equal parts in order to recover original variables X and Y. There's going to be some distortion/loss of information. But I wanted to visually see how much.
Here's an example of how one of the images looked before and after the PCA approximation. That is, compressing 2500 dimensions down to 40 dimensions, and then linearly approximating the original 2500 again:
Original: Approximation (With 95% Variance Retained)

The loss of information makes an already blurry image (because its only 50 x 50 pixels) more unsightly, but guess what? For purposes of a machine learning algorithm, the dimension reduced version will more likely than not suffice.

The loss of information makes an already blurry image (because its only 50 x 50 pixels) more unsightly, but guess what? For purposes of a machine learning algorithm, the dimension reduced version will more likely than not suffice.























