Introduction to R

Malo Jan

Outline of the course

  • Presentation of the course
  • Why are we doing this
  • R basics

About Your Instructor

  • Malo Jan, PhD candidate in Political Science at CEE Sciences Po
  • Graduated from the master’s program one year ago
  • Research focus: Political Parties, Legislative Politics, Political Behavior, and Climate Politics
    • And… I’m quite passionate about data science and all of the (fun and unexpected) things we can do with R nowadays

The goals of this class

  • Lab sessions of the quantitative methods class (either taught by J. Rovny or J.Carstens)
  • Gain practical experience applying quantitative methods using R and RStudio
  • Teach you the basics, give you the willigness to learn more by yourself and prepare for Quant 2

Prerequisites

  • This class is designed for complete beginners in R and require no programming knowledge
    • Results from my poll : you have different background and knowledge
    • Advanced students: Feel free to request additional resources and assist others.
    • Beginners: No need to feel intimidated; we are here to learn. Ask questions if I’m moving too quickly.

Course validation

  • Two graded exercises (each 15% during the semester, dates TBA)
  • Each session include a small exercice (pass/fail). Don’t worry about success but give it a try. Learning R is really about practicing.

Course material

  • A website
  • A github repo with the raw material
  • A Moodle page to upload the exercises.
  • I will also provide a zip file of each session’s content beforehand

To go further with github

Visit this page to know more about version control with github and R

Why R and Rstudio

Me :

Alt Text

Probably some of you :

Alt Text

What is programming useful for

  • Programming is the act of instructing computers to perform specific tasks by writing code using programming languages (such as Python, Java, C++ or…R)
  • Why should we doing this as social scientists ?
    • Collect, manipulate, reshape and clean data
    • Visualize data by creating beautiful plots
    • Computing descriptive statistics (such as mean, median, crosstabs..)
    • Analyse data through statistical modelling
    • And many other things…
  • But why not other tools such as SPSS/STATA ?

Free and open source


Alt Text
  • R is free and open source
  • Big community of users to incorporate new tools and functionalities

Reproducibility


  • Excel is not reproducible
  • Reproducibility : an identical analysis can be replicated using the same code and dataset
  • Reproducibility crisis : hard to reproduce most of the quantitative papers published

Reproducibility

Rise of Computational social sciences

  • CSS as interdisciplinary field employing computational tools and data to adress questions relevant to social sciences
  • Proliferation of digital data, increasing computational power, advances in AI : opportunities for social sciences
  • Few exemples requiring programming :
    • Web scraping
    • Machine Learning
    • Text, image and audio as data

Valuable skills on the job market

Programming as a qualitative researcher

  • Programming is commonly linked with quantitative research because automation is really helpful when we have large datasets

  • Programming can be intimidating for more qualitative researcher but…a few reasons to learn those skills

    1. Allow yourself to be surprised (.eg unexpected datasets on your topic)
    2. Contextualizing qualitative data (eg. Plots and descriptives)
    3. Collecting data (eg. web scraping)
    4. Formatting data (eg. OCR, speech-to-text algorithms such as Whisper)
    5. Analyzing data (eg. automated text analysis)

R vs Python ?

  • R : stastical focus, efficient for data manipulation and modelling, widespread in social sciences
  • Python : general-purpose programming language, popular in industry, essential for new stuff in machine learning
  • At one point, if you are interested in data science, learn both.

Let’s turn to RStudio to practice!