Source code for the paper "Potemkin Understanding in Large Language Models" by Marina Mancoridis, Keyon Vafa, Bec Weeks, and Sendhil Mullainathan.
If you find this repository useful for your research, please consider citing our work:
@article{mancoridis2025potemkin,
title={Potemkin Understanding in Large Language Models},
author={Mancoridis, Marina and Weeks, Bec and Vafa, Keyon and Mullainathan, Sendhil},
journal={arXiv preprint arXiv:2506.21521},
year={2025}
}
This guide is structured into the following main components:
- Installation
- Quickstart
- Benchmark Dataset
- Automatic Evaluation
- Incoherence
Below, you'll find detailed instructions to effectively use each component.
Before you begin, make sure you have Conda (version ≥4.6) installed on your system.
-
Clone the repository
git clone https://github.com/MarinaMancoridis/PotemkinBenchmark.git cd PotemkinBenchmark -
Create the Conda environment
We provide an
environment.ymlfile listing all required packages (Python 3.9+). To create the environment, run:conda env create --file environment.yml
-
Activate the environment
conda activate potemkin
Get up and running in a few simple steps:
-
Run a sample command: This command reproduces Table 1, showing potemkin rate by task.
cd BenchmarkDataset && python -c "from potemkin_rates import print_potemkin_rate_by_task; print_potemkin_rate_by_task()"
-
View labeled instances
The file
BenchmarkDataset/labeled_instances.csvcontains labeled instances and non-instances of each concept, across all three domains. These labeled instances are used in subtasks of the benchmark dataset analysis.
The BenchmarkDataset directory is organized by the four main tasks in our framework, each contained within its own subdirectory:
- Define
- Classify
- Generate
- Edit
To reproduce Table 1 (potemkin rates by task), run the following command from the repository's root directory:
cd BenchmarkDataset && python -c "from potemkin_rates import print_potemkin_rate_by_task; print_potemkin_rate_by_task()"- We provide an iterator to access labeled model responses for each task in a standardized format.
- A sample API for using the iterators is provided in
BenchmarkDataset/main.py. - The source code for the iterators themselves can be accessed in
BenchmarkDataset/iterators.py. - Helper functions to compute potemkin rates can be found in
potemkin_rates.py. - Additional configuration details, such as the lists of models and concepts, are defined in
constants.py.
For the automatic evaluation, we use the AutomaticEval directory. Make sure to set up the API keys in the private/models.py file; you can do this by running export OPENAI_API_KEY=... and so on.
To run the automatic evaluation, go to the AutomaticEval directory and run
python main.py
The results will be saved in the AutomaticEval/results directory.
All relevant results for our incoherence analysis are provided in the Incoherence directory.
To reproduce the first column of Table 2 (incoherence rates by model), run the following command from the repository's root directory:
cd Incoherence && python -c "from incoherence_rates import print_incoherence_by_model; print_incoherence_by_model()"