Principal Components Analysis

Exploring relationships between nutrients and food groups

Alicia Fennell true
03-11-2021

The data was downloaded from the U.S. Department of Agriculture (USDA) FoodData Central website. The data contained nutrient content for different individual foods across 25 different food groups. This assignment uses principal component analysis (PCA) to explore the relationships between a few selected nutrients across three different food groups.

Code
## Read in the USDA nutrient data, clean names, and filter to only include a few food groups of interest

nutrients <- read_csv("usda_nutrients.csv") %>% 
  clean_names() %>% 
  filter(food_group == c("Vegetables and Vegetable Products", "Nut and Seed Products", "Sausages and Luncheon Meats"))

## Create a data frame for PCA and choose the variables of interest 
## Scale the data to ensure no data is overweighted 
## Run PCA using prcomp()

nut_pca <- nutrients %>% 
  select(fiber_g, fat_g, vit_e_mg, vit_c_mg, carb_g, energy_kcal) %>% 
  rename("Fiber (g)" = fiber_g) %>%
  rename("Fat (g)" = fat_g) %>%
  rename("Vitamin E (mg)" = vit_e_mg) %>%
  rename("Vitamin C (mg)" = vit_c_mg) %>%
  rename("Carbohydrates (g)" = carb_g) %>%
  rename("Energy (kcal)" = energy_kcal) %>%
  drop_na() %>% 
  scale() %>%
  prcomp() 

## Create a biplot 

autoplot(nut_pca,
         data = nutrients,
         colour = "food_group",
         loadings = TRUE,
         loadings.label = TRUE,
         label.size = 5,
         loadings.colour = "black",
         loadings.label.colour = "black",
         loadings.label.vjust = 2,
         loadings.label.hjust = .25,
         xlim = (c(-0.1, 0.25))) +
  theme_minimal() +
  scale_color_manual(values = c("goldenrod2", "firebrick4", "chartreuse4")) 

Figure 1. Principal component analysis (PCA) of various nutrients across food groups. The first two principal components explain 45.02% and 19.26% of the variance in the data.