University of Notre Dame · Department of Computer Science & Engineering
Email Scholar GitHub LinkedIn

Large Language Models for Scientific Discovery

Conservation, truth‑commission archives, and reliable evaluation.

My dissertation studies how to understand and deploy large language models in real scientific settings. We integrate LLMs with knowledge graphs, examine how people use them in practice, and test how knowledge graphs can improve reasoning, retrieval, and usefulness for applied research. I study, build, and evaluate practical tools with Prof. Nitesh Chawla. I help wildlife scientists find evidence, guide readers through Colombia’s Truth Commission archives, and make model evaluation match real work.

Anna Sokol, PhD candidate at Notre Dame
Anna Sokol · University of Notre Dame

Education

  • Ph.D., Computer Science — University of Notre Dame · 2022–2027
  • M.A., Sociology — Higher School of Economics · 2016–2018
  • B.A., Sociology — Vladimir State University · 2012–2016

Research Highlights

Wildlife & Conservation

LLMs for scientific discovery · 2024–2025

I build tools that read and connect ecological evidence to support real conservation work.

Truth & Reconciliation

Colombian archives · 2024–2025

I design search and guidance for communities navigating the Truth Commission’s records.

Risk & Benchmarking

Evaluation practices · 2023–2024

I write clearer documentation and checks so evaluations reflect real use.

Selected Projects

Ventana a la Verdad

Archive navigation for Colombia’s CEV. demo · code · paper

BenchmarkCards

Documentation standard for LLM benchmarks. paper

Conservation Assistant

Evidence retrieval for wildlife research. demo

Latest News

  • — Notre Dame / IBM Technology Ethics Lab Fellows announced. Read
  • — DCU Business: Notre Dame–IBM Technology Ethics Lab awards nearly $1M to collaborative projects. Read
  • — Graduate Justice Fellows cohort. Read
  • — Graduate Scholar Spotlight: Lucy Family Institute. Read
  • — Lucy Graduate Scholars cohort. Read
  • — ADSA 2024 Annual Meeting, University of Michigan.
  • — Legacy Project — People. Read

Publications

  • 1
    Breaking Language Barriers: Equitable Performance in Multilingual Language Models
    T. Nagar, G. Khvatskii, A. Sokol, N. V. Chawla — NAACL 2025 · arXiv:2508.12662
  • 2
    Ventana a la Verdad: A Chatbot Application for Navigating The Colombian Truth Commission's Archives
    A. Sokol, M. L. Sisk, J. E. Alvarez, N. Chawla — ACM WSDM 2025
  • 3
    BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
    A. Sokol, E. Daly, M. Hind, D. Piorkowski, X. Zhang, N. Moniz, N. Chawla — arXiv:2410.12974 · 2024
  • 4
    Conformalized Selective Regression
    A. Sokol, N. Moniz, N. Chawla — arXiv:2402.16300 · 2024
  • 5
    Neural Network Modeling and What‑If Scenarios: Applications for Market Development Forecasting
    V. Kuskova, D. Zaytsev, G. Khvatsky, A. Sokol — Applications in Reliability and Statistical Computing, Springer · 2023
  • 6
    Searching for Coherence in a Fragmented Field: Temporal and Keywords Network Analysis in Political Science
    D. G. Zaytsev, V. V. Kuskova, G. S. Khvatsky, A. A. Sokol — Network Science · 2023
  • 7
    Expanding the Boundaries of Interdisciplinary Field: Contribution of Network Science Journal
    V. V. Kuskova, D. G. Zaytsev, G. S. Khvatsky, A. A. Sokol, M. D. Vorobeva, R. A. Kamalov — Network Science · 2023
  • 8
    Computational Tools of Media Analysis for Corporate Policy Effectiveness Evaluation
    G. S. Khvatsky, D. G. Zaytsev, V. V. Kuskova, A. A. Sokol — Reliability and Maintainability Assessment of Industrial Systems, Springer · 2022
  • 9
    Neural network modeling and what‑if scenarios: Applications to various‑term sales forecasts
    K. Valentina, Z. Dmitry, S. Anna, K. Gregory — Proceedings of the 26th ISSAT International Conference on Reliability and Quality in Design · 2021
  • 10
    Cross‑National Comparison of Protest Publics' Roles as Drivers of Change
    D. G. Zaytsev, A. I. Galina, A. A. Sokol — Protest Publics: Toward a New Concept of Mass Civic Action · 2019

Conferences

  • NAACL 2025 — Albuquerque, NM, USA — 2025
  • Doctoral Forum, SIAM SDM25 — 2025
  • AAAI 2025 — Philadelphia, PA, USA · 2025
  • ADSA 2024 — University of Michigan, Ann Arbor, MI
  • CODS–COMAD 2024 — Jodhpur, India · 2024
  • Sunbelt 2023 — Portland, OR, USA
  • APSA 2022 — Montreal, Canada
  • EUSN 2021 — Naples, Italy
  • XXII April Conf. — Moscow, Russia · 2021
  • IPSA World Congress 2021 — Lisbon, Portugal
  • ECPR 2021 — Virtual
  • Networks in the Global World 2020 — St. Petersburg, Russia
  • Sunbelt 2020 — Paris, France
  • XX April Conf. 2019 — Moscow, Russia
  • ECPR General Conference 2018 — Hamburg, Germany
  • Sunbelt 2018 — Utrecht, Netherlands
  • XIX April Conf. 2018 — Moscow, Russia

Academic Appointments

  • Graduate Research & Teaching Assistant — DIAL Laboratory, University of Notre Dame · 2022–present
  • Program Deputy Academic Supervisor, MDNA — International Laboratory for Applied Network Research, HSE · 2021–2022
  • Research Assistant — International Laboratory for Applied Network Research, HSE · 2017–2020

Academic Internship

  • Open Research Laboratory — University of Illinois Urbana–Champaign · Spring 2022

Teaching & Academic Service

  • Course Instructor: Introduction to Analytical Programming — HSE (2022)
  • Course Instructor: Introduction to Data Analytics — HSE (2022)
  • Course Instructor: Practical Regression Analysis — Coursera · HSE (2021)
  • Course Instructor: Introduction to Applied Analytics — HSE (2021)
  • Teaching Assistant: Data Science; AI for Social Good — Notre Dame
  • Associate Director: DeMAS‑Center Statistical Consulting
  • Coursera TA: Network Analysis; Contemporary Data Analysis; Advanced Probability & Stochastic Processes

Fellowships & Honors

  • 2025: Tech Ethics Fellowship, University of Notre Dame
  • 2024–2025: Graduate Justice Fellowship, University of Notre Dame
  • 2023–2025: Lucy Scholar, University of Notre Dame
  • 2022: Open Research Laboratory Fellow, UIUC

Memberships

  • International Network for Social Network Analysis (INSNA), 2020–2022
  • International Political Science Association (IPSA), 2020–2022

Industry Experience

  • Business Analyst — Saint‑Gobain Russia · 2019–2020
  • Market Analyst — Nestlé Russia · 2018
  • Junior Communication Planner — Mediacom · 2017–2018
  • Intern — Federal State Statistics Service · 2015–2016
  • Trainee — Independent Regional Research Agency (NARI) · 2014

Selected Industry Projects

  • Project management dashboards for Saint‑Gobain (Russia & CIS) — 2020
  • Market outlooks for coffee machines (Retail & Fuel) — Nestlé — 2019
  • Advertising research: shaving & personal care — P&G via Mediacom — 2018
  • Category research: washing liquids & dishwasher capsules — P&G via Mediacom — 2017

Connect & Collaborate