Longitudinal Data Analysis Using R and LLMs - A Short Course
A 4-Day Livestream Seminar Taught by Stephen Vaisey, Ph.D.
Read reviews of this courseThis seminar is part of the Causal Inference Certification, a 4-course program designed to strengthen practical skills in causal design, estimation, and interpretation. Contact us to learn how to complete the certification and access discounted pricing.
The most common type of longitudinal data is panel data or repeated measures data, consisting of measurements of predictor and response variables at two or more points in time for many individuals (or other units). Panel data enable two major advances over cross-sectional data:
-
- the ability to model the evolution of outcomes over time; and
- the ability to control for unobserved unit-specific heterogeneity, enabling better causal inferences.
Different data structures allow researchers to use panel data in different ways. In this course, we will focus on the following approaches:
-
- Mixed models (including random growth curves)
- Two-period difference-in-differences
- Fixed-effects models (one-way and two-way)
- Between-within models
- Dynamic panel models
In addition to considering these approaches and their implementation in R, we will discuss when each is (not) suitable given data constraints. We will also consider how to adapt these approaches to deal with limited dependent variables (especially binary outcomes).
This course provides a solid foundation in longitudinal data analysis in R while also equipping you with a set of structured prompts to use with your large language model (LLM) of choice. LLMs like Claude can serve as invaluable “research assistants” but need to be prompted in a skillful way to maximize their usefulness and avoid pitfalls. You will learn how to use Claude to help design, estimate, interpret, and understand the assumptions of your longitudinal models.
Explicit discussion of LLM prompting will comprise approximately 15% of course time.
Starting August 11, this seminar will be presented as a 4-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. If you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
ECTS Equivalent Points: 1
Computing
The vast majority of what you will learn in this course can be applied in any software package. However, this seminar will mostly use R for empirical examples and exercises. To replicate the instructor’s workflow in the course, you should have R and RStudio already installed on your computer when the course begins.
For LLM support, the instructor will use the most recent paid version of Claude. However, most modern LLMs (e.g., ChatGPT and Gemini) will be useful for understanding, modifying, and interpreting longitudinal models.
Stata notes and syntax are available upon request.
Basic familiarity with R is highly desirable. If you are new to R, check out Professor Vaisey’s one-hour Introduction to R video to get up to speed. You may also want to consider taking a short introductory seminar on R such as Introduction to R for Data Analysis, R for SPSS Users, R for SAS Users, or R for Stata Users.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who Should Register?
This course is for anyone who wants to learn how to analyze repeated measures panel data. You should have a basic foundation in linear regression.
Outline
Module 1. Foundations of Panel Data
- What is panel data?
- Long form and wide form
- Within and between variation
- LLMs: giving the context of your panel data
Module 2. Growth Curves and Early Causal Designs
- Mixed models (growth curves) for time-constant treatments
- LLMs: prompting for growth-curve model specification and interpretation
- Introduction to fixed effects (“within” estimation)
- Pre-test/post-test designs
- Two-period difference-in-differences
Module 3. Within, Between, and Identification
- Two-way fixed effects
- Between-within models for time-varying treatments
- LLMs: diagnosing identification assumptions and interpreting within vs. between effects
Module 4. Dynamic Models and Robustness
- Introduction to linear structural equations for panel data
- Introduction to dynamic panel models
- LLMs: common pitfalls and robustness checks for dynamic panel models
Reviews of Longitudinal Data Analysis Using R and LLMs
“Professor Vaisey is fabulous at explaining things and emphasizes a deeper understanding of the statistics he is teaching … Through this approach, he fits a wide range of longitudinal models into a single conceptual framework, helping students connect their research questions and data structure to the modeling approaches they should consider. What’s more, the class is fun.”
Nick Huntington, Brandeis University
“Dr. Vaisey was great. He had clear explanations, was engaging, very knowledgeable, yet also very chill and approachable … He is the only teacher I’ve ever had that acknowledged formulas are intimidating and that our brain wants to skid over them, which was really validating. I would definitely take a course with him again.”
Mariah Wood, Kaiser Permanente
“The course was very well organized, with strong conceptual instruction, separate coding files, exercises, and other helpful materials. Stephen is very good at communicating the conceptual parts of the models. I would recommend the course to anyone who needs a good overview of different longitudinal models and wants to understand the underlying mechanisms of them.”
Pernille Melander-Nyboe, Odense University Hospital
“The lectures were excellent, but the most useful takeaway is the additional course materials. The working R scripts for each of the methods presented, with sample exercises, were critical to help reinforce understanding and give us the ability to tinker a bit to understand how the code works and why it does what it does.”
Andrew Althouse, University of Pittsburgh
“Dr. Vaisey was amazing! He broke everything down well, provided all of the R code in the slides and R Markdown file, and kept the class entertaining and lighthearted.”
Nick Hollman, University of Oklahoma
Seminar Information
Tuesday, August 11 –
Friday, August 14, 2026
Schedule: All sessions are held live via Zoom. All times are ET (New York time).
10:30am-12:30pm (convert to your local time)
1:30pm-3:00pm
Payment Information
The fee of $995 USD includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.

Back to Public Seminars