Text Classification with LLMs in R - A Short Course
A 3-Day Livestream Seminar Taught by Hudson Golino, Ph.D.
Read reviews of this courseThis seminar is part of the AI-Enabled Data Analytics Certification, a series of 4 expert-led courses designed to build practical AI skills for research and data analysis. Contact us to learn how to complete your certification and access special pricing.
This seminar will introduce you to basic techniques to convert unstructured text data to structured data in R. As a necessary precursor to large language transformer models (LLMs), the course will also cover LLM embeddings and their use, and you will gain hands-on experience implementing them in R.
Additionally, the course will cover the concept of zero-shot classification, which involves using LLMs for text classification without the need for labeled data. You will learn about Hugging Face Transformers and implement zero-shot classification in R. The course will also cover retrieval-augmented generation to discover and understand topics in texts for automatic zero-shot text classification using R and pre-trained transformer models. Finally, the course will cover text classification using LLM embeddings and network models.
Overall, the goal of this course is to provide you with a comprehensive (applied) understanding of LLMs for research applications. By the end of the course, you will be equipped with the necessary skills to apply these techniques to analyze and extract insights from unstructured text data in your research work.
Starting February 11, this seminar will be presented as a 3-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. But if you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
ECTS Equivalent Points: 1
More Details About the Course Content
Why are Large Language Transformer Models (LLMs) so popular nowadays?
Large language transformer models (such as GPT-5, GPT-oss-120b, and GPT-oss-20b) have gained popularity for several reasons:
- State-of-the-art performance: These models have achieved state-of-the-art performance on a wide range of natural language processing tasks, including language translation, text summarization, question answering, and language generation.
- Zero-shot learning: LLMs can perform tasks for which they have not been explicitly trained, a property known as zero-shot learning. This is because they have been trained on a vast amount of diverse text data, allowing them to understand the underlying patterns and relationships in natural language.
- Scalability: LLMs are highly scalable and can be fine-tuned for specific tasks with relatively small amounts of task-specific data.
- General-purpose: LLMs are designed to be general-purpose, meaning they can be used for a wide variety of natural language processing tasks without the need for specialized models for each task.
- Ease of use: Many LLMs are available as pre-trained models, allowing developers and researchers to use them without the need for extensive training or expertise in natural language processing.
Overall, the combination of state-of-the-art performance, zero-shot learning, scalability, general-purpose design, and ease of use make large language transformer models highly attractive for a wide range of natural language processing applications.
This course is designed as a first introduction to natural language processing and large language models for research applications, covering some basic concepts and applications of transformer models in R.
Computing
This is a hands-on course with instructor-led software demonstrations and guided exercises. These guided exercises are designed for the R language, so you should use a computer with a recent version of R (version 4.1.3 or later) and RStudio (version 2022.02.1+461 or later).
To follow along with the course exercises, you should have good familiarity with the use of R, including opening and executing data files and programs, as well as performing very basic data manipulation and analyses.
If you’d like to take this course but are concerned that you don’t know enough R, there are excellent online resources for learning the basics. Here are our recommendations.
Who Should Register?
The course is designed for participants who have a solid basic understanding of R and are interested in applying NLP techniques to extract insights from unstructured text data for research purposes.
Outline
Introduction to Large Language Models
-
- Understanding transformer architectures
- Overview of state-of-the-art LLMs
- Practical applications in research
Text Generation for Classification
-
- Working with Groq API
- Using free and open source models
- Implementing text generation in R
Text Mining Fundamentals
-
- What is text mining and its common applications
- From texts to structured data
- Text tokenization, stop word removal, and stemming
- Traditional bag-of-words approaches
- Transforming text data into a usable format for modeling
Word and Document Embeddings using LLMs
-
- Introduction to LLM embeddings and their use in text classification
- Different types of embeddings
- BERT and other LLM embeddings in R
- Working with encoder models
Network-Based Text Classification
-
- Exploratory Graph Analysis (EGA) for text data
- Networks for text classification
- Mining embeddings with network analysis in R
Advanced Classification Techniques
-
- Zero-shot classification with Hugging Face Transformers
- Implementation of zero-shot methods in R
- Retrieval-augmented generation (RAG) for text analysis
- Automatic text classification and summarization using R
Time-Series Text Analysis (if time allows)
-
- Dynamic Exploratory Graph Analysis for repeated measures
- Topic modeling for intensive longitudinal text data
- Generalized local linear approximation and time-delay embedding
Reviews of Text Classification with LLMs in R
“The presenter was extremely knowledgeable about the topics discussed. Plenty of time for questions to be asked and answered. The course material provided was well put together.”
Cody Knight, Department of Veterans Affairs
“The instructor is passionate and very clear in his explanations. Also, it is great to learn from an expert who is working actively in the field. He was able to explain very complex concepts in a simplified way so that we can understand the different steps of the code that we are applying. I also liked the general knowledge on LLMs and other applications that was gained from the course.”
Catherine Chanfreau, Department of Veterans Affairs
“I recently completed a course on large language modeling, and it exceeded my expectations. The presentations were top-notch, providing clear insights into complex concepts. The discussions were engaging, fostering a collaborative learning environment. The instructors were knowledgeable, making the entire experience highly valuable. I highly recommend this course to anyone interested in exploring large language models.”
Sepideh Banava, University of California San Francisco
“I highly recommend the training. The instructors were friendly, helpful, and thorough in their approach, making sure important concepts were clearly explained and understood. They were always available to answer questions and provide guidance, which made the learning process so much more enjoyable and effective.
The time spent going over the worked examples was particularly useful, as it allowed me to gain a deeper understanding by seeing how the concepts we had learned actually functioned in a practical manner. I highly recommend this course to researchers looking for an introduction to using large language models in their research. It is well-structured, comprehensive, and the support provided by the instructors is second to none.”
William Rayo, Oregon State University
Seminar Information
Daily Schedule: All sessions are held live via Zoom. All times are ET (New York time).
10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
Payment Information
The fee of $995 USD includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.

Back to Public Seminars