Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Calculation of ideal DCG in NDCG is incorrect #17070

Open
alexeyrodriguez opened this issue Nov 26, 2024 · 1 comment
Open

[Bug]: Calculation of ideal DCG in NDCG is incorrect #17070

alexeyrodriguez opened this issue Nov 26, 2024 · 1 comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@alexeyrodriguez
Copy link

Bug Description

The calculation of ideal DCG in the NDCG metric (metrics.py) is incorrect:

        idcg = sum(
            discounted_gain(rel=True, i=i, mode=mode)
            for i in range(1, len(retrieved_ids) + 1)
        )

Ideal DCG should be calculated considering expected_ids and not retrieved_ids as per Wikipedia

Version

0.12.1

Steps to Reproduce

Not relevant

Relevant Logs/Tracbacks

No response

@alexeyrodriguez alexeyrodriguez added bug Something isn't working triage Issue needs to be triaged/prioritized labels Nov 26, 2024
Copy link

dosubot bot commented Nov 26, 2024

Hey @alexeyrodriguez! 👋 I'm here to help with bugs, questions, and contributions. Let's squash those bugs together!

The issue you've identified is indeed a valid concern. The Ideal Discounted Cumulative Gain (IDCG) should be calculated based on the expected_ids, which represent the ideal order of relevant documents, rather than the retrieved_ids. This ensures that the IDCG reflects the best possible ordering of relevant documents.

To correct this, you should modify the calculation of idcg to consider the expected_ids. Here's a suggested change:

idcg = sum(
    discounted_gain(rel=True, i=i, mode=mode)
    for i in range(1, len(expected_ids) + 1)
)

This adjustment will ensure that the IDCG is calculated based on the ideal ranking of the expected relevant documents, aligning with the standard definition of NDCG [1].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

1 participant