Skip to content

Workshop on Large Language Models (LLMs) – Best practices, challenges and opportunities in (automatic) content analysis for social science research – 11th of April

Workshop on Large Language Models (LLMs) – Best practices, challenges and opportunities in (automatic) content analysis for social science research – 11th of April Article Image

5. March 2025

Large Language Models (LLMs), such as ChatGPT, offer new, unparalleled possibilities and reach human levels of performance for many tasks (e.g., classification, interpretation, pattern detection). They raise questions of importance for social science research, including: How good are they really for classification and interpretative tasks? How can I write the best prompts? How can I evaluate the results in R, and access the LLM from R? As very many queries are expensive, and data is often sensitive, which locally executable alternatives are there? What about Llama3, for instance? How can I inject knowledge on a topic into the LLM, for instance with RAG? Do I need LLMs for all tasks? Where does it make more sense to use document classification, topic models, conceptual maps? 

The answers to these questions will be integrated in the practical part of the workshop. Participants will learn how to triangulate main LLM approaches and other content analysis strategies for studying social science related concepts. Challenges and opportunities of these approaches will be discussed and datasets from social media, news media and expert communication are selected to address topics of interest for communication, political, linguistic and social sciences (including algorithms, religion, health and ethics). The focus will also be on ethical questions and pay particular attention to validation and robustness check strategies (e.g., integration of and correlation with external (survey) data, event and manual validations, blackbox models), while highlighting the usefulness of a “human-in-the-loop” component.

The workshop will take place in April 11th form 12:30 to 17:00 (room SOC-E-006 and online). Participants are encouraged to bring along / send us their own data 2 weeks before the workshop. The workshop will include hands-on exercises with R and Python. Participants need basic knowledge of R to follow the workshop. Data and codes will be provided in a dedicated repository, as well as presentation slides and useful theoretical resources.