Understanding Large Language Model Behaviors through
Interactive Counterfactual Generation and Analysis

IEEE VIS 2025IEEE Transactions on Visualization and Computer Graphics
F. Cheng
Furui Cheng*
V. Zouhar
Vilém Zouhar*
R. S. M. Chan
Robin Shing Moon Chan*
D. Fürst
Daniel Fürst
H. Strobelt
Hendrik Strobelt
M. El-Assady
Mennatallah El-Assady*
*ETH Zürich
the University of Konstanz
IBM Research
LLM Analyzer enables users to analyze and understand LLM behaviors through meaningful counterfactuals

Abstract

Understanding the behavior of large language models (LLMs) is crucial for ensuring their safe and reliable use. However, existing explainable AI (XAI) methods for LLMs primarily rely on word-level explanations, which are often computationally inefficient and misaligned with human reasoning processes. Moreover, these methods often treat explanation as a one-time output, overlooking its inherently interactive and iterative nature. In this paper, we present LLM Analyzer, an interactive visualization system that addresses these limitations by enabling intuitive and efficient exploration of LLM behaviors through counterfactual analysis. Our system features a novel algorithm that generates fluent and semantically meaningful counterfactuals via targeted removal and replacement operations at user-defined levels of granularity. These counterfactuals are used to compute feature attribution scores, which are then integrated with concrete examples in a table-based visualization, supporting dynamic analysis of model behavior. A user study with LLM practitioners and interviews with experts demonstrate the system's usability and effectiveness, emphasizing the importance of involving humans in the explanation process as active participants rather than passive recipients.

BibTeX
@article{cheng2025llmanalyzer,
    title     = {{Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis}},
    author    = { Cheng, Furui              and
                  Zouhar, Vilém             and
                  Chan, Robin Shing Moon    and
                  Fürst, Daniel            and
                  Strobelt, Hendrik         and
                  El-Assady, Mennatallah },
    year      = {2025},
    journal   = {arXiv preprint arXiv:2405.00708},
    doi       = {10.48550/arXiv.2405.00708},
    eprint    = {2405.00708},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

Keywords

CounterfactualExplainable Artificial IntelligenceLarge Language ModelVisualizationHuman-Computer InteractionNatural Language Processing