Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat(cody): detect symbol names in query#62976

Merged
jtibshirani merged 6 commits intomainfrom
jtibs/symbols
May 30, 2024
Merged

feat(cody): detect symbol names in query#62976
jtibshirani merged 6 commits intomainfrom
jtibs/symbols

Conversation

@jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented May 29, 2024

This PR updates Cody context search to pull symbol names out of the query and
perform an additional symbol search. The symbol search results are deduplicated
with the code search results, so we always prefer the symbol definition over
another file match.

Benefits:

  • When a query references a symbol, we always include its definition in context
    (unless it is quite ambiguous). This is a basic thing users would expect, and
    improves our "Find symbol" evals.
  • We're more likely to return high quality chunks, since symbol results are
    always anchored on the line with the definition.

Addresses SPLF-88

Test plan

Adapted unit tests. Tested locally.

Changelog

We've made various fixes and improvements to Enterprise Chat context (backed by Zoekt):

  • Explicit matching on file names and symbols within chat questions
  • Better handle long and foreign-language questions through a query rewrite
  • Better handle common questions like "what does this repo do?"

@cla-bot cla-bot bot added the cla-signed label May 29, 2024
@github-actions github-actions bot added team/product-platform team/search-platform Issues owned by the search platform team labels May 29, 2024
@jtibshirani
Copy link
Contributor Author

jtibshirani commented May 29, 2024

golden queries evals

Before

Breakdown by class:
Find symbol	7/10
Find string	2/2
Explain file	2/2
Explain concept	4/5
Check dependency	1/2
Find logic	31/43
Gather information	12/16
Changelog	0/2
Ownership	2/2
How-to	1/1
Foreign language	0/2
Long request	0/2

Combined recall	62/89

After

Breakdown by class:
Find symbol	9/10
Find string	2/2
Explain file	2/2
Explain concept	4/5
Check dependency	1/2
Find logic	31/43
Gather information	12/16
Changelog	0/2
Ownership	2/2
How-to	1/1
Foreign language	0/2
Long request	0/2

Combined recall	64/89

CodeSearchNet evals: results unchanged

@jtibshirani jtibshirani requested a review from a team May 29, 2024 22:46
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice.

One general question based on me not fully understanding stuff. Do we rely at all at NumContextLines to ensure we give enough context in the chunks? If so, do our symbol results respect that?

@jtibshirani
Copy link
Contributor Author

Do we rely at all at NumContextLines to ensure we give enough context in the chunks?

We do not rely on NumContextLines. The getCodyContext API makes its own decision about how many lines of context to load, based on its estimation of token limits. Then it consults gitserver to actually load that content. This feels okay for now, and could be optimized in the future.

@jtibshirani jtibshirani merged commit 91518a1 into main May 30, 2024
@jtibshirani jtibshirani deleted the jtibs/symbols branch May 30, 2024 18:52
@jtibshirani jtibshirani changed the title Context: detect symbol names in query feat(context): detect symbol names in query Jul 10, 2024
@jtibshirani jtibshirani changed the title feat(context): detect symbol names in query feat(cody): detect symbol names in query Jul 10, 2024
jtibshirani added a commit that referenced this pull request Jul 11, 2024
This PR fixes an important bug in #62976, where we didn't properly map the
symbol line match to the return type. Instead, we accidentally treated symbol
matches like file matches and returned the start of the file.

## Test plan

Add new unit test for symbol match conversion. Extensive manual testing.
sourcegraph-release-bot pushed a commit that referenced this pull request Jul 11, 2024
This PR fixes an important bug in #62976, where we didn't properly map the
symbol line match to the return type. Instead, we accidentally treated symbol
matches like file matches and returned the start of the file.

## Test plan

Add new unit test for symbol match conversion. Extensive manual testing.

(cherry picked from commit 004eb0f)
jdpleiness pushed a commit that referenced this pull request Jul 11, 2024
This PR fixes an important bug in #62976, where we didn't properly
map the
symbol line match to the return type. Instead, we accidentally treated
symbol
matches like file matches and returned the start of the file.

## Test plan

Add new unit test for symbol match conversion. Extensive manual testing.
<br> Backport 004eb0f from #63773

Co-authored-by: Julie Tibshirani <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/search-platform Issues owned by the search platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants