Skip to content

Regarding Multi-vector / ColBERT Scoring and Normalization Logic #5502

Open
@zhuohaoyu

Description

Hi, I am currently having a issue with thresholding while using ColBERT as a reranker. I notice that according to the scoring logic here (https://github.com/qdrant/qdrant/blob/master/lib/segment/src/vector_storage/query_scorer/mod.rs#L34-L54), the scores should always be in range [0, min(len(query_vectors), (doc_vectors))].

However, in practice, I always get a score with at least 15, no matter how short my query string would be. Is this due to a padding (or something similar) logic in the implementation? I tried to use the semantic search site but could not found related code.

Same issue could be found here: qdrant/fastembed#383 and stanford-futuredata/ColBERT#374

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions