Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] TokenStream.onRetrieved, please return the score #1733

Open
yushengliao opened this issue Sep 9, 2024 · 8 comments
Open

[FEATURE] TokenStream.onRetrieved, please return the score #1733

yushengliao opened this issue Sep 9, 2024 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers P2 High priority P3 Medium priority

Comments

@yushengliao
Copy link

Knowing the score is important, as contentAggregator, embeddingRetriever, and customRetriever do not have a score in the returned List. This is crucial for displaying to users during both development and production phases.
And each Retriever's score can be determined to be the best

@yushengliao yushengliao added the enhancement New feature or request label Sep 9, 2024
@langchain4j
Copy link
Owner

@yushengliao we could add a score to the Content class, but it will only work for Contents retuned by EmbeddingStoreContentRetriever, other retrievers (including custom ones) won't have it

@langchain4j langchain4j added P2 High priority P3 Medium priority labels Sep 9, 2024
@langchain4j langchain4j changed the title [FEATURE] TokenStream.oReytrieved, please return the score [FEATURE] TokenStream.onRetrieved, please return the score Oct 8, 2024
@fmx0717
Copy link

fmx0717 commented Oct 10, 2024

@langchain4j
Could we consider returning the embedding_id within the Content, so that, in conjunction with the returned score and the existing metadata and text, it would provide more flexibility for handling various business scenarios?

@langchain4j
Copy link
Owner

@fmx0717 we probably could. You can also store the ID in the metadata of TextSegment though.

I guess we can add a Map<String, Object> metadata field to dev.langchain4j.rag.content.Content and store score and embedding id there. WDYT?

@fmx0717
Copy link

fmx0717 commented Oct 11, 2024

@langchain4j
Of course, it's a great idea.

@langchain4j
Copy link
Owner

@fmx0717 would you like to open a PR?

@fmx0717
Copy link

fmx0717 commented Oct 11, 2024

@langchain4j
Thank you for your trust and the opportunity. I'm very interested in the project and will definitely consider contributing in the future when I have more time. However, I am currently quite busy with other commitments. Thanks again for understanding!

@langchain4j langchain4j added the good first issue Good for newcomers label Oct 11, 2024
@yushengliao
Copy link
Author

@langchain4j

Perhaps I would like to complete this PR. Could you please advise on how to implement it effectively?

1.Add EmbeddingScore and ReRankScore member variables in the Content class?
2.Or add key-value pairs for these scores in Content.TextSegment.Metadata?

@langchain4j
Copy link
Owner

@yushengliao I would probably introduce a Map<String, Object> metadata field in the dev.langchain4j.rag.content.Content and store score(s) and other things (e.g. embedding id) there. WDYT?

We also need to make sure that ContentAggregator works properly after these changes and does not take this metadata map into account when comparing two Contents (e.g., for RRF)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers P2 High priority P3 Medium priority
Projects
None yet
Development

No branches or pull requests

3 participants