Get Updates

Longitudinal Data Analysis with Stata and LLMs - A Short Course

A 3-Day Livestream Seminar Taught by Stephen Vaisey, Ph.D.

Read reviews of this course
Download Sample Course Slides

The most common type of longitudinal data is panel data or repeated measures data, consisting of measurements of predictor and response variables at two or more points in time for many individuals (or other units). Panel data enable two major advances over cross-sectional data:

    1. the ability to model the evolution of outcomes over time; and
    2. the ability to “control” for unobserved unit-specific heterogeneity, enabling better causal inferences.

Different data structures allow researchers to use panel data in different ways. In this course, we will focus on the following approaches:

    1. Mixed models (including random growth curves)
    2. Two period difference-in-differences
    3. Fixed-effects models (one-way and two-way)
    4. Between-within models
    5. Dynamic panel models

In addition to considering these approaches and their implementation in Stata, we will discuss when each is (not) suitable given data constraints. We will also consider how to adapt these approaches to deal with limited dependent variables (especially binary outcomes).

This course provides a solid foundation in longitudinal data analysis in Stata while also equipping you with a set of structured prompts to use with your Large Language Model (LLM) of choice. LLMs like ChatGPT can serve as invaluable “research assistants” but need to be prompted in a skillful way to maximize their usefulness and avoid pitfalls. You will learn how to use ChatGPT to help design, estimate, interpret, and understand the assumptions of your longitudinal models.

Explicit discussion of LLM prompting will comprise approximately 15-20% of course time.

Starting March 18, this seminar will be presented as a 3-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. But if you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.

Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.

ECTS Equivalent Points: 1

Computing

The vast majority of what you will learn in this course can be applied in any software package. This seminar will use the most recent version of Stata for empirical examples and exercises. (Nearly all commands will work in Stata 14+ as well.)

For LLM support, the instructor will use the most recent paid version of ChatGPT. However, most modern LLMs (e.g., Claude, Gemini) will be useful for understanding, modifying, and interpreting longitudinal models.

Basic familiarity with Stata is highly desirable, but even novice Stata users should be able to follow the presentation and do the exercises.

R notes and syntax are available upon request; however, they will not include the LLM prompts used in the course.

If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.

Who Should Register?

This course is for anyone who wants to learn to analyze repeated measures panel data. You should have a basic foundation in linear regression.

Outline

  • What is panel data?
  • Long form and wide form
  • Within and between variation
  • LLMs: giving the context of your panel data
  • Mixed models (growth curves) for time-constant treatments
  • LLMs: prompting for growth-curve model specification and interpretation
  • Introduction to fixed effects (“within” estimation)
  • Pre-test/post-test designs
  • Two-period difference-in-differences
  • Two-way fixed effects
  • Between-within models for time-varying treatments
  • LLMs: diagnosing identification assumptions and interpreting within vs. between effects
  • Introduction to linear structural equations for panel data
  • Introduction to dynamic panel models
  • LLMs: common pitfalls and robustness checks for dynamic panel models

Reviews of Longitudinal Data Analysis with Stata and LLMs

“Professor Vaisey does an excellent job of explaining material clearly, thoroughly, and with extremely intuitive research (and real-life) examples. I also like that recordings are available so quickly, therefore allowing for a mix of in-person and recorded viewing to accommodate different people’s schedules.”
  Cesar Rebellon, George Mason University

“Stephen was a great teacher who took time to clearly explain theoretical concepts and would thoroughly answer all of the questions. He managed to help me create an intuition and further excitement for longitudinal methods and analysis.”
  Vassilena Iankova, Ludwig Maximilian University

“Professor Vaisey takes very complicated concepts and explains them in an easily digestible manner. I appreciate the level of detail he provides and his willingness to go “off script” to give more explanation. Lastly, he infuses his quirky and delightful humor into his lectures, which helps take the edge off and makes learning fun.”
  Alice Daugherty, The University of Alabama

“I really enjoyed this course. My lack of comfort with panel data has been holding back my research. Now I feel much more confident to move forward!”
  Danielle Lamb, Toronto Metropolitan University

“I thought Dr. Vaisey’s attention to practical problems and examples was very effective for my learning. Stats lectures can sometimes feel very abstract with all of the formulas, but working through practical examples in class helped me think about my own research in a productive way. Even though this workshop was intensive and some of the topics were quite complicated, Dr. Vaisey explained any questions/topics raised in a clear and accessible way. This course was very well thought out. My understanding of the course material would build each day and I feel more confident in my ability to further study some topics. I found this course to be immensely helpful and valuable to my statistical training as I continue to develop my research in grad school.”
  Radhika Prasad, University at Albany

Seminar Information

Wednesday, March 18 –
Friday, March 20, 2026

Schedule: All sessions are held live via Zoom. All times are ET (New York time).

10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm

Payment Information

The fee of $995 includes all course materials.

PayPal and all major credit cards are accepted.

Our Tax ID number is 26-4576270.