GitHub Next | Repo Mind

Software engineering agents are increasingly capable at making local edits, but they still struggle to build a reliable mental model of a large repository. Repo Mind is our exploration of a better starting point: a repository understanding system that can answer both local questions like "where is this implemented?" and broader questions like "how is this codebase organized?".

Architecture

Repo Mind is organized around an indexing pipeline and a query pipeline.

At indexing time, it creates two complementary views of a repository. One is a semantic retrieval layer built from source code, code summaries, documentation, and issue and pull request text. The other is a GraphRAG-style structure that groups related repository artifacts into hierarchical clusters.

For source code, Repo Mind uses Tree-sitter to parse files and extract top-level declarations such as functions, classes, and type definitions. It also extracts semantic relationships between those declarations along the call graph and subtyping. Those declarations become the basic units of the system: They are summarized, embedded, and later used as graph nodes. Restricting the representation to top-level declarations keeps the index smaller and makes the summaries more meaningful than summarizing arbitrary (e.g. statement-level) fragments.

After that structural extraction step, Repo Mind builds several kinds of searchable artifacts:

raw source-code chunks
LLM-generated summaries of top-level declarations
documentation chunks
issue and pull request chunks

These artifacts are embedded and stored in vector databases.

On top of those vector databases, Repo Mind builds a graph layer for global sensemaking. For code, nodes come from top-level declarations and edges come from semantic relationships. For documentation and issues and pull requests, nodes come from chunks, and edges are created from nearest-neighbor similarity. Repo Mind then applies the Leiden community detection algorithm to organize those nodes into multi-level clusters. Depending on the GraphRAG configuration, clusters get summarized either at indexing time or at query time.

Overview of Repo Mind architecture.

At query time, Repo Mind first retrieves the most relevant local chunks using vector similarity. It then augments that local evidence with higher-level graph context, so the final answer can combine "here is the exact implementation" with "here is the broader subsystem this belongs to". In some configurations that graph context comes from precomputed cluster summaries; in others it is assembled more lazily during the query itself. Repo Mind also explored a newer GraphRAG Zero-style approach that leans even more heavily on query-time work: the system retrieves candidate chunks, uses graph structure and cluster membership to guide selection, and produces the final answer directly from those retrieved chunks rather than from stored cluster summaries. Repo Mind also supports query rewriting, which separates retrieval from answer formatting, so a question can be turned into a more retrieval-friendly query before the final response is generated.

This architecture is useful because the components solve different problems. Semantic retrieval is strong when the question is narrow and the target is likely to live in a specific file or declaration. The structural graph helps when exact symbols or relationships matter. Graph-based summaries are stronger when the question is architectural, cross-cutting, or spread across multiple modules. Repo Mind is designed to support all three modes of understanding in the same system, for both humans and AI agents.

What We Learned

The key part of the project was not just building the system, but understanding what kinds of context actually improve agent behavior. Evaluating Repo Mind with copilot coding agents led to four main takeaways.

1. Better context improves consistency and reliability

Repo Mind improved overall performance on swebench-pro, moving the resolution rate from 44.97% to 46.09%. That point estimate understates the broader lesson, though. Earlier in the project, when the underlying coding models were weaker, the uplift from Repo Mind was meaningfully larger. As newer models were integrated into Copilot Coding Agent, the baseline itself improved substantially: the models became better at searching, navigating, and assembling repository context on their own. In that setting, it is notable that Repo Mind still delivered gains, especially in how consistently problems were solved. The consistency metric pass^2 improved by 4.7 percentage points, and pass^3 improved by 6.7 points.

That is a meaningful distinction. For coding agents, the quality bar is not just “solve occasionally,” but “solve reliably enough that the experience feels dependable.” Repo Mind’s strongest effect was making successful outcomes less fragile.

2. Global codebase understanding becomes more valuable as tasks get harder

The evaluation stratified problems by the size of the ground-truth patch, using patch size as a proxy for problem complexity. The uplift from Repo Mind increased with problem difficulty: medium-sized solutions gained 1.7 percentage points, and large solutions gained 2.1 points.

This matches the underlying hypothesis behind the project. Small, local edits can often be solved with ordinary search and short-range reasoning. Harder tasks tend to span files, subsystems, and abstractions. That is exactly where a repository-level view becomes more important.

Measurement results showing Repo Mind uplift over baseline across pass-k consistency levels and solution complexity

Measurement results showing that Repo Mind's uplift grows as consistency is weighted more heavily and as problems become more complex.

3. Structural tools help when they are actually used

Repo Mind also exposes LSP-like tools, including symbol search, file-level symbol listings, and type-hierarchy queries. Those tools were useful on the subset of benchmark instances that used them directly. On swebench-pro, resolution improved from 53.1% to 59.2% on the instances where the agent chose to call those LSP-style tools, a gain of 6.1 percentage points.

This is an important learning for system design: deterministic structural navigation and semantic retrieval are complementary. When the agent can shift from broad exploration to precise structural lookup at the right moment, it searches more efficiently and lands on the right implementation points more often.

4. Tool design matters as much as tool quality

One of the clearest lessons from the evaluation was that useful tools create the most value when they fit naturally into the agent’s workflow. LSP-style Repo Mind tools were adopted only about 8% of the time on swebench-pro and 18% on swebench-verified. When those same tools were wrapped in a subagent, performance did not improve and the extra orchestration introduced overhead instead.

That result points to a broader design constraint for AI tooling: a tool has to fit naturally into the agent’s planning habits. If the agent prefers grep, file viewing, and shell-based exploration, then even a strong structural tool may underperform unless it feels like a natural next step in that workflow. Repo Mind taught us that context systems have to be designed not only for capability, but also for adoption.

Why It Matters

Taken together, these results suggest that repository-wide context is most valuable when problems are broad, multi-file, and structurally complex, and that the form of that context matters just as much as the content. Repo Mind was a concrete exploration of that idea: better repository understanding, better structural navigation, and better global summaries can all help, but only when they are integrated in a way that the agent will reliably use.