13  Correlation

Todays focus is on measuring the relationship between two continuous variables. To do so, we will first look at how to do correlations. Then, we will look at how to perform linear regression.

13.1 A brief reminder on correlation

A correlation is a statistical measure that expresses the extent to which two variables are linearly related. A correlation coefficient measures the strength and direction of a linear relationship between two variables. It quantifies how changes in one variable correspond to changes in another variable. The correlation coefficient ranges between -1 and 1, where -1 indicates a perfect negative linear relationship, 1 indicates a perfect positive linear relationship, and 0 indicates no linear relationship.

\[ r = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum{(x_i - \bar{x})^2} \sum{(y_i - \bar{y})^2}}}\ \]

Correlations coefficients give you a indicator of the intensity of the relationships between two continuous variables.

13.2 Performing correlations in R

To explore correlations, we will work on affective polarization in France. Affective polarization is a really trendy topic in political science. The concept relates to how much people of some (political) groups tend to like or dislike each other. Here, we will use data from the french electoral study conducted during the campaing of the last presidential election and asking to each respondent to what extent they have sympathy for the people who vote for each party. We want to look at which groups of party supporters tend to be liked or dislike together.

needs(tidyverse, haven, corrr, broom)

fes2022 <- read_dta("data/fes2022v3.dta")

fes2022 |> count(fes3_QA07_A)
# A tibble: 12 × 2
   fes3_QA07_A                n
   <dbl+lbl>              <int>
 1    0 [0]                 151
 2    1 [1]                  80
 3    2 [2]                  99
 4    3 [3]                 158
 5    4 [4]                 116
 6    5 [5]                 498
 7    6 [6]                 120
 8    7 [7]                 123
 9    8 [8]                 106
10    9 [9]                  56
11   10 [10]                 72
12 9999 [N'a pas répondu]    35
fes2022$fes3_QA07_A
<labelled<double>[1614]>: Sentiment envers groupes : gens qui votent pour la France insoumise
   [1]    2    1    1    4    7    3    5    4    4    9   10   10    2    2
  [15]    5    3    8    5    7    0    2    8    5    0    4    5    1    3
  [29]    5    2    5    3    3    0    6    3    3    9    5    7    8    5
  [43]    5    4    5    3    5    3    5    0    5    4    5    5    5 9999
  [57]    5    5    0    5    5    4    5    7    2    3    4    1    0    0
  [71]    3    6    3    6    7    5    5    5    5    3    5    6    5    5
  [85]    2    5    0    9    4    4    5    5    5    1    2    1    4    3
  [99]    1    1    5    2    7    1    0    4    5    8    1    5    9    1
 [113]    4    5    5    6    7    5    1    8    5    5    2    4    6    5
 [127]    4    9    2    2    5    5    4    7    5    4    5    7    4    2
 [141]    2    7    6    0    8    5    2   10    1    5    3    2    5    6
 [155]    8    5    4    5    8    5 9999 9999    2    6    4    4    5    4
 [169]    7    7    4    0    3    1    4    8    5    5    9    2    4    5
 [183]    3    8    5    7    3    5    5    3    2    5    5    7    7    3
 [197]    5    0    8    5    7    3    7    6    3    0    2    5    5    5
 [211]    5    5    6    5    5 9999    5    1    5    5    7    3 9999    7
 [225]    1    5    5    6    3    5    5    6 9999    5    2    4    3    1
 [239]    5    2    5    7    0    2    8    2    0    5    6    5    0    6
 [253]    5    4    0    6    4   10    0    6    5    2    5   10    5    5
 [267]    6    1    5    2    9    1    5    2    5    2    5    0    3    5
 [281]    0    2    5    5   10    7    5    1    7    4    6    5    9    4
 [295]    3    5    2    8    9    5    5    5    3    9    6    3    1    0
 [309]    8    5    7   10    5 9999    8    9    5    7    2    5    8    8
 [323]    8    3    7    3    6    4   10    4    0    4    5    5    5    9
 [337]    5    7    3    6    5    9    4    3    6    2    0    8    5    4
 [351]    6    5    1    9    4    0    3    6    3    2 9999    3    3    0
 [365]    8    0    5    5    5    4    0    3    5    0    7    0    0    4
 [379]    8    6    5    3   10    9    4    2    3    6    8    9    3    5
 [393]    7    4    5    6    3    7    6    2    2    3    5    8    5    6
 [407]    4    5    5    8    2   10    5    5    0    4    5    6    5    4
 [421]    5    5    3    7    8    6    5    0    7    0    3    5    5    5
 [435]    4    0    5   10    6    3    6    3    5    5    3    3    8    7
 [449]    5    5    8    1    5    7    5    6    3    0    5 9999    7    5
 [463]    1    6    5    1    0    7    0    3    5    5    7    6    5    7
 [477]    5    5    5    2    3    3    3    6    5    3    5    2    8    0
 [491]    1    5    2    5    5    2    0    5    5    7    3    8    3    3
 [505]    0    8    7   10    4    0   10    8    5    7    3    1    5    0
 [519]    3    8    5    6    9    0    3    5   10    4    3    1    9    5
 [533]    0    7    5    6    0    4    5    5    2    5    1    4    7    2
 [547]    9    5    4    0    5    6    7    6    5    5    7    3    7    0
 [561] 9999    5    0    6    6    5    9    5    5    5    3    6    2   10
 [575]    3    1    4    5    2    4    2    0    7 9999    5   10    8    5
 [589]    0    6    4    5   10    8    3    1    6    8    5    5    5    5
 [603]    3    0    5   10    6    5    0    0    5    5    5    8    8    7
 [617]    4    5    5    5    1    5    5    0    5    4    3    0    9    7
 [631]    3    5    6    5    7    6    5    8    8    7    5    9    1    4
 [645]    4    8    1    5    5    5    5    0    5    5    8    5    3    3
 [659]    6    2    3    5    3    5    7    5    5    3 9999    5    5    5
 [673]    4    5    3    5    7    5    5    5    2    7    4    8    8    8
 [687]    5    5    5 9999    0    6    8    1    2    3    3    0    5   10
 [701]    5    9    0    1    4    2    4    0    1    2    7    9    1    4
 [715]   10    5    5    3    5    0    5    0    5   10    1    5    5    3
 [729]    2    5    5    5    0 9999    4    4    5    3    8    5    7    1
 [743]    2    0    5    0    5 9999    5   10    5    5    1    7    5    1
 [757]    8    1    5    5    0    4    0    5    2    2    3    3    5    5
 [771]   10    2    0    5    5    5    7    6    0    5    1    0    4    7
 [785]    8    6    3    5    5    1    5    7    5    0    2    7    7    4
 [799]    5    0    6    7    5    5    5    4    0    3 9999 9999    0    9
 [813]    8    5    0    7    5    8    2    4    2   10    5    5    4    5
 [827]    5    5    4    0    5    0 9999    5    0    2    0    7    4    5
 [841]    5    0    5    8    4    8 9999    5    5    6    2    3    8    5
 [855]    5    6    3    5    5   10    1    3    1    3    3    7    5    2
 [869]    0    0    5    6    5   10    6    2    8    1    3    5    3    0
 [883]    3    5    5    1    8    4    3    8    4    7    3    5    5    3
 [897]    7    0    4    0    2    5    9    5    0    4    2    3    9    1
 [911]    5    8    5    6    2    5    4    1    0    7    7   10    5    6
 [925]    2    5    0    0    0    5    9    2    6    6    8    5    6    5
 [939]    5    7    6    4    5    8    9    0    2    7    5    5    3    5
 [953]    4    5    7    6    1    5    5    6    5    4    7    8    2    5
 [967]    9    9    5    1    6    6    1    3    5    3    0    0    5    5
 [981]    8    0    7    1    5   10    5    5    0    5    5    5    5    0
 [995]    7 9999    5    3    6    0    5    9    5    5    3    5    6    4
[1009]    5    1    5    9    5    1    9    6    3    8    5    3   10    5
[1023]    5    3    6    7    2    5    5    2    9    3    0    5    3    3
[1037]    8    2    3    5    1    7 9999    1    5    0    1    0    0    5
[1051]    9    0    5    0    0    5    8    7    3    2   10    3    5    0
[1065]    4    4    5    3    5    5    0    1    6   10    0    5 9999    3
[1079]    5    5    7    4    6    5    0    5    6    4    8    6    5    5
[1093]    9    9    5    3    7    8    5    8    7    5    7    9    5    0
[1107]    5    5    3   10    5    2    1    1    5    6    4    7    5    5
[1121]   10    5    4    5    0   10    7   10    7    7    5    5    6   10
[1135]    9    5 9999    5    4    5    5    5    5    9   10    9    5    4
[1149]    5    1   10    5    0    2    5    8    8    4    0 9999    5    5
[1163]   10 9999    5    5 9999   10    5    7    4    2    5    5    0    7
[1177]    6    5    5    2    7    6    1    7    8    8    5    5    7    5
[1191]    5    8    5    6    3   10    5    3    5    5    3    5    5    5
[1205]    1    5    5    5    3    0    2    5   10    5    8    7    8    0
[1219]    8    5    4    5    2    3    5    6    5    3 9999    5    6    5
[1233]    5    0    5    8 9999    4    5    8    2    4 9999    9   10    5
[1247] 9999    2    5    5    1    6    3    5    3    3    5    7   10    5
[1261]    5    4    7    3    7    2    3    2    5    8    2    5    8 9999
[1275]    6    8    3    0    0   10    2 9999    8   10    1    1    4    8
[1289]    6    8    5    9    5    5    5    1   10    4    1    8    3    6
[1303]    3    6    4    7    8   10    5    5    5    8   10    5   10    0
[1317]    0    3    3    7    5    1    5    5    0    0    0    5    0    0
[1331]    5    2   10    8    8    4 9999   10    4    7    5    2    5    5
[1345]    5    5    3    7    8    0    5    7    9    3    5    5    6    8
[1359]    3   10    9    5    3    7    0    3    2    9    3    8    6    0
[1373]    0    5    5    6    5    3    5    0    5    3    8    6    5    0
[1387]    5    9    7    6    5   10    6    5    6    5    0    1    3    7
[1401]    2    5    7    8    5    7    7    5    7    7   10    3    8    5
[1415]    4    7    5    9    5    6    0    5    4    5    6    2   10    3
[1429]    8    3    0    6    5    1    3    5    4    4   10    3    1    7
[1443]    6    9    8    0    5    5    0    4    5    0    1    4    6    4
[1457]    5    3    3    2   10    2    6    7    6    5    7    3    4    6
[1471]    0    8   10    5   10 9999    4   10    9   10    5    5    0 9999
[1485]    5    5    5   10    6    1    6    5    7    7    2    5    0    6
[1499]    2    3    4    5    5    3   10    5    8    8    7    3    5    5
[1513]    5    5    5    3    6    8    3    8    0    0    7    0    8    5
[1527]    0    5    6    5    4    5   10    5    0    8   10    7    0    6
[1541]    7    6    5    2    5    6    5    7    6    0    6    5    0   10
[1555]    9    5    3    8    5    7    2    1    8    2    5    7   10    5
[1569]    6    4   10    3    2   10    5   10    5    0    3    9    5    4
[1583]    5    6    3    0    5    1    5    4    5    9    5    6    0    6
[1597]    5    7    9    5    6    4    5    5    5    9    1    3    4    7
[1611]    8    3    6   10

Labels:
 value           label
     0               0
     1               1
     2               2
     3               3
     4               4
     5               5
     6               6
     7               7
     8               8
     9               9
    10              10
  9999 N'a pas répondu

The information we are interested in is contained in the variables from fes3_QA07_A to fes3_QA07_G. These variables contain the responses of the participants to the question, ‘On a scale from 0 to 10, how much sympathy do you have for people who vote for [PARTY NAME]?’ We have variables for 6 different parties: LFI, EELV, PS, LREM, LR, RN, and REC. The responses are coded from 0 to 10, with 0 indicating ‘no sympathy at all’ and 10 indicating ‘a lot of sympathy.’ We will first rename and recode these variables to make them more user-friendly.

fes2022 <- fes2022 |>
  # Rename each variabe by prefix symp_ + party name
  rename(
    symp_fi = fes3_QA07_A,
    symp_eelv = fes3_QA07_B,
    symp_ps = fes3_QA07_C,
    symp_lrem = fes3_QA07_D,
    symp_lr = fes3_QA07_E,
    symp_rn = fes3_QA07_F,
    symp_re = fes3_QA07_G
  ) |>
  # Recode the 7 variables at once by replacing missing values with 5
  mutate_at(
    # Specify which variables we want to change : those starting with "symp"
    vars(starts_with("symp")), 
    # For each of those variable, when a value is between 0 and 10, keep it, otherwise replace it with 5
    ~ case_when(.x %in% c(0:10) ~ .x, .default = 5))

fes2022 |> 
  count(symp_eelv)
# A tibble: 11 × 2
   symp_eelv     n
   <dbl+lbl> <int>
 1  0 [0]       68
 2  1 [1]       43
 3  2 [2]       97
 4  3 [3]      122
 5  4 [4]      106
 6  5 [5]      505
 7  6 [6]      184
 8  7 [7]      202
 9  8 [8]      156
10  9 [9]       66
11 10 [10]      65

Now, we do not have any missing values in all of those variables. We will start by looking at the relationship between sympathy for LFI and sympathy for LREM. We will use the cor() function to compute the correlation between the two variables. Note that in the case we have missing values in our data, the cor function will not work. In that case, you can add use = "complete.obs" as argument in the cor()function.

Here we see that the relationship is negative, meaning that people who have sympathy for LFI tend to have less sympathy for LREM.

# Standard way to compute a correlation in R

cor(fes2022$symp_fi, fes2022$symp_lrem)
[1] -0.2126489

We will also use the cor.test() function to test whether the correlation is statistically significant.

(affect_test <- cor.test(fes2022$symp_fi, fes2022$symp_lrem))

    Pearson's product-moment correlation

data:  fes2022$symp_fi and fes2022$symp_lrem
t = -8.7376, df = 1612, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2587569 -0.1655741
sample estimates:
       cor 
-0.2126489 

We can also have a look a the tidy output of the correlation test.

tidy(affect_test)
# A tibble: 1 × 8
  estimate statistic  p.value parameter conf.low conf.high method    alternative
     <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>      
1   -0.213     -8.74 5.85e-18      1612   -0.259    -0.166 Pearson'… two.sided  

When looking at relationships between variables, it is always a good idea to visualize the data. Here, we will use a scatterplot to visualize the relationship between sympathy for LFI and sympathy for LREM. We will use the ggplot2 package to do so.

fes2022 |> 
  ggplot(aes(symp_fi, symp_lrem)) + 
  geom_point() +
  geom_smooth()

13.2.1 Multiple correlation with the corrr package

Sometimes, we are not interested in the correlation of two variables but of many at the same time. To do this, we compute correlation matrices, which can be done with the cor() function that we have seen above. But, the corrr package comes with super handy functions to obtain, manipulate and visualize the results of correlations. You will need first to install the package and load it.

I first create a subset of my dataset with only the variables measuring affective polarization towards different groups of voters.

(fes_affect <- fes2022 |> 
   select(starts_with("symp")))
# A tibble: 1,614 × 7
   symp_fi   symp_eelv symp_ps   symp_lrem symp_lr   symp_rn   symp_re  
   <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
 1 2 [2]      4 [4]    5 [5]     8 [8]     8 [8]      2 [2]     1 [1]   
 2 1 [1]      5 [5]    1 [1]     5 [5]     1 [1]     10 [10]    5 [5]   
 3 1 [1]      1 [1]    1 [1]     1 [1]     3 [3]      7 [7]    10 [10]  
 4 4 [4]      7 [7]    7 [7]     7 [7]     4 [4]      0 [0]     0 [0]   
 5 7 [7]     10 [10]   7 [7]     5 [5]     1 [1]      0 [0]     0 [0]   
 6 3 [3]      9 [9]    3 [3]     4 [4]     3 [3]      3 [3]     3 [3]   
 7 5 [5]      5 [5]    5 [5]     5 [5]     5 [5]      4 [4]     3 [3]   
 8 4 [4]      5 [5]    5 [5]     6 [6]     6 [6]      4 [4]     4 [4]   
 9 4 [4]      5 [5]    5 [5]     5 [5]     5 [5]      2 [2]     2 [2]   
10 9 [9]      6 [6]    5 [5]     2 [2]     2 [2]      1 [1]     0 [0]   
# ℹ 1,604 more rows

And then, I use the corrr::correlate() function to compute the correlation matrix. This contains the correlation between each pair of variables.

(affect_matrix <- fes_affect |> 
    corrr::correlate())
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'
# A tibble: 7 × 8
  term      symp_fi symp_eelv symp_ps symp_lrem symp_lr symp_rn symp_re
  <chr>       <dbl>     <dbl>   <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
1 symp_fi    NA        0.563   0.507    -0.213  -0.270   -0.193  -0.182
2 symp_eelv   0.563   NA       0.620     0.0917 -0.0987  -0.315  -0.321
3 symp_ps     0.507    0.620  NA         0.188  -0.0440  -0.339  -0.303
4 symp_lrem  -0.213    0.0917  0.188    NA       0.522   -0.161  -0.148
5 symp_lr    -0.270   -0.0987 -0.0440    0.522  NA        0.271   0.256
6 symp_rn    -0.193   -0.315  -0.339    -0.161   0.271   NA       0.769
7 symp_re    -0.182   -0.321  -0.303    -0.148   0.256    0.769  NA    

To make this more readable, we can also use the following to get the variables that are the most correlated with each others.

(affect_df <- affect_matrix |>
  # Keep only the upper triangle of the matrix
  shave() |>
  # Transform the matrix into a dataframe of 3 columns : var1, var2, correlation
  stretch(na.rm = T))
# A tibble: 21 × 3
   x         y               r
   <chr>     <chr>       <dbl>
 1 symp_fi   symp_eelv  0.563 
 2 symp_fi   symp_ps    0.507 
 3 symp_fi   symp_lrem -0.213 
 4 symp_fi   symp_lr   -0.270 
 5 symp_fi   symp_rn   -0.193 
 6 symp_fi   symp_re   -0.182 
 7 symp_eelv symp_ps    0.620 
 8 symp_eelv symp_lrem  0.0917
 9 symp_eelv symp_lr   -0.0987
10 symp_eelv symp_rn   -0.315 
# ℹ 11 more rows
affect_df
# A tibble: 21 × 3
   x         y               r
   <chr>     <chr>       <dbl>
 1 symp_fi   symp_eelv  0.563 
 2 symp_fi   symp_ps    0.507 
 3 symp_fi   symp_lrem -0.213 
 4 symp_fi   symp_lr   -0.270 
 5 symp_fi   symp_rn   -0.193 
 6 symp_fi   symp_re   -0.182 
 7 symp_eelv symp_ps    0.620 
 8 symp_eelv symp_lrem  0.0917
 9 symp_eelv symp_lr   -0.0987
10 symp_eelv symp_rn   -0.315 
# ℹ 11 more rows

We see for instance here that the variables measuring affective polarization towards the RN and Reconquête are the most positively correlated. This means that people having sympathy for people voting for one of this party tend to have sympathy for people voting for the other party.

affect_df |> 
    # Order the dataframe by correlation from 1 to -1
  arrange(-r)
# A tibble: 21 × 3
   x         y               r
   <chr>     <chr>       <dbl>
 1 symp_rn   symp_re    0.769 
 2 symp_eelv symp_ps    0.620 
 3 symp_fi   symp_eelv  0.563 
 4 symp_lrem symp_lr    0.522 
 5 symp_fi   symp_ps    0.507 
 6 symp_lr   symp_rn    0.271 
 7 symp_lr   symp_re    0.256 
 8 symp_ps   symp_lrem  0.188 
 9 symp_eelv symp_lrem  0.0917
10 symp_ps   symp_lr   -0.0440
# ℹ 11 more rows

On the other hand, we see that the variables measuring affective polarization towards the PS and the RN are the most negatively correlated. This means that people having sympathy for people voting for one of this party tend to have antipathy for people voting for the other party. However, the relationship is not as strong as the one between the RN and Reconquête.

affect_df |>
    # Order the dataframe by correlation from - 1 to 1
  arrange(r)
# A tibble: 21 × 3
   x         y              r
   <chr>     <chr>      <dbl>
 1 symp_ps   symp_rn   -0.339
 2 symp_eelv symp_re   -0.321
 3 symp_eelv symp_rn   -0.315
 4 symp_ps   symp_re   -0.303
 5 symp_fi   symp_lr   -0.270
 6 symp_fi   symp_lrem -0.213
 7 symp_fi   symp_rn   -0.193
 8 symp_fi   symp_re   -0.182
 9 symp_lrem symp_rn   -0.161
10 symp_lrem symp_re   -0.148
# ℹ 11 more rows

Calculating a correlation test for each pairwise correlation is a bit trickier but here an example of how you could proceed. We create first a function that will calculate a test between two variables and then we use the colpair_map() function on our dataset to calculate all the tests. We then filter the results to keep only the non-significant results.

# Create function to calculate a correlation test between two variables and extract the p-value
calc_cor_test <- function(var_1, var_2) {
  cor.test(var_1, var_2)$p.value
}


cor.test(fes2022$symp_eelv, fes2022$symp_fi)

    Pearson's product-moment correlation

data:  fes2022$symp_eelv and fes2022$symp_fi
t = 27.358, df = 1612, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5288387 0.5955318
sample estimates:
      cor 
0.5631015 
# For each pair of variables, calculate the correlation test

(tests <- colpair_map(fes_affect, calc_cor_test))
# A tibble: 7 × 8
  term         symp_fi  symp_eelv    symp_ps  symp_lrem    symp_lr    symp_rn
  <chr>          <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1 symp_fi   NA          1.12e-135  3.34e-106  5.85e- 18  2.07e- 28  5.30e- 15
2 symp_eelv  1.12e-135 NA          3.00e-172  2.25e-  4  7.08e-  5  1.99e- 38
3 symp_ps    3.34e-106  3.00e-172 NA          2.71e- 14  7.69e-  2  1.39e- 44
4 symp_lrem  5.85e- 18  2.25e-  4  2.71e- 14 NA          2.16e-113  7.52e- 11
5 symp_lr    2.07e- 28  7.08e-  5  7.69e-  2  2.16e-113 NA          1.71e- 28
6 symp_rn    5.30e- 15  1.99e- 38  1.39e- 44  7.52e- 11  1.71e- 28 NA        
7 symp_re    1.96e- 13  5.47e- 40  1.51e- 35  2.20e-  9  1.44e- 25  3.53e-316
# ℹ 1 more variable: symp_re <dbl>
tests |> 
  # Keep only the upper triangle of the matrix
  shave() |> 
  # Transform the matrix into a dataframe of 3 columns : var1, var2, correlation
  stretch(na.rm = T) |> 
  # Add a column with a TRUE/FALSE value depending on whether the p-value is below 0.05
  mutate(significant = r < 0.05) |>
  # Keep only the non-significant results
  filter(significant == FALSE)
# A tibble: 1 × 4
  x       y            r significant
  <chr>   <chr>    <dbl> <lgl>      
1 symp_ps symp_lr 0.0769 FALSE      

The only correlation that is not significant in our test is between the socialist party and republican. That means that there is no significant relationship between how people like or dislike those who vote for the Republicans and for the socialist party.

We can also visualize the correlations either with a heatmap or with a network plot.

# Heatmap
affect_matrix |> 
  autoplot()

# Network plot
affect_matrix |> 
  network_plot(0.1)

13.3 A short introduction to PCA

Measuring the correlation of multiples variables is useful when we want to reduce the number of variables in our dataset and find out which variables are the most correlated with each others. To do so, we can use a technique called Principal Component Analysis (PCA) which is a dimensionality reduction technique. Basically, it allows us to reduce the number of variables in our dataset by creating new variables that are linear combinations of the original variables. The new variables are called principal components. The first principal component is the one that explains the most variance in the data. The second principal component is the one that explains the second most variance in the data, and so on. I am not going to go into the details of how PCA works but if you want to learn more about it, you can check out this video or this article. Here, I just show you an exemple of a PCA on all of our variables measuring affective polarization.

# Install/Load the two packages I will use for the PCA
needs(FactoMineR, factoextra)

pca_fes <- fes2022 |> 
  # Select only the variables measuring affective polarization
  select(starts_with("symp")) |> 
  # Perform the PCA
  PCA()

# Visualize both the variables and the individuals on the first two principal components  

fviz_pca_biplot(pca_fes, label = "var", 
               ggtheme = theme_minimal())