feat: use __doc__ as dataset description by dhairya-pandya · Pull Request #3021 · pgmpy/pgmpy

dhairya-pandya · 2026-03-17T17:13:07Z

The following checklist is mandatory.

Your PR will be closed if you remove the checklist or do not answer the questions to a satisfactory level. Use of LLMs is strictly forbidden for any part of this checklist (including for improving language), and will result in a ban if we find any use of LLMs.

Your checklist for this pull request

Have you followed all the steps from our Contributing Guide?
Does the PR fully address the linked issue and is within its defined scope? If you are still working on the PR, mark it as draft.
Are all the GitHub Actions checks passing? If not, mark your PR as draft while you fix it.

Please answer the following questions:

Did you use an LLM for any assistance with this PR? Please describe in detail (around a paragraph) how and what you used it for?
[Please Answer Here]
What steps have you taken to verify that the changes correctly address the issue? And what edge cases have you considered? Other than running tests, what else have you verified?
[Please Answer Here]
Has the LLM added try-except blocks? They will need to be removed; any error handling must be explicit.
[Please Answer Here]
Have you used LLM for generating tests? They need to be compressed into a smaller number of tests without reducing coverage.
[Please Answer Here]

Issue number(s) that this pull request fixes

Fixes [ENH] Figure out a better way to allow uses to access references for example datasets and example models #2596

List of changes to the codebase in this pull request

Instead of trying to "parse" references, we now take the class's raw docstring (doc) and assign it directly to a new description field on the Dataset object as discussed in the PR [ENH]: Add get_reference() for programmatic access to dataset/model citations #2684 inside the load_datase
```
  return Dataset(
      name=name,
      data=target_cls.load_dataframe(),
      expert_knowledge=target_cls.load_expert_knowledge(),
      ground_truth=target_cls.load_ground_truth(),
      description=target_cls.__doc__,
      tags=target_cls.get_class_tags(),
  )
```

Copilot

Pull request overview

This PR adds a human-readable dataset description sourced directly from the dataset class docstring (__doc__) and surfaces it via the Dataset object.

Changes:

Add description field to Dataset and include it in Dataset.__str__.
Populate Dataset.description in load_dataset() from target_cls.__doc__ and update docs example.
Refactor load_model() lookup via a helper and align the “available models” error message + tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pgmpy/datasets/_base.py	Introduces `Dataset.description`, prints it, and assigns it from `__doc__` in `load_dataset()`.
pgmpy/example_models/_base.py	Adds `_find_model_class()` helper and updates `load_model()` error handling/message.
pgmpy/tests/test_datasets/test_datasets.py	Minor formatting-only change (blank line).
pgmpy/tests/test_example_models/test_example_models.py	Updates expected error message string for `load_model()`.
pgmpy/utils/utils.py	Formatting-only change (blank line).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

pgmpy/tests/test_datasets/test_datasets.py

 def test_invalid_tag():
    with pytest.raises(ValueError, match="Unrecognized filter argument"):
        list_datasets(is_paraterized=True)  # typo

    with pytest.raises(ValueError, match="Unrecognized filter argument"):
        list_datasets(num_samples=100)  # wrong key name entirely


pgmpy/datasets/_base.py

    data: pd.DataFrame
    expert_knowledge: Optional[ExpertKnowledge] = None
    ground_truth: Optional[DAG] = None
+    description: str | None = None


pgmpy/datasets/_base.py

    def __str__(self) -> str:
        return (
            f"Dataset(name={self.name}, \n data=DataFrame of size: {self.data.shape}, \n "
-            f"expert_knowledge={self.expert_knowledge}, \n ground_truth={self.ground_truth}, \n tags={self.tags})"
+            f"expert_knowledge={self.expert_knowledge}, \n ground_truth={self.ground_truth}, \n "
+            f"description={self.description}, \n tags={self.tags})"
        )


pgmpy/example_models/_base.py

ankurankan · 2026-03-17T19:52:06Z

Marking this draft till it is ready for review.

codecov · 2026-03-17T22:09:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.64%. Comparing base (43e76b0) to head (2d652fe).
⚠️ Report is 2 commits behind head on dev.

Additional details and impacted files

@@           Coverage Diff           @@
##              dev    #3021   +/-   ##
=======================================
  Coverage   95.64%   95.64%           
=======================================
  Files         504      504           
  Lines       29111    29117    +6     
=======================================
+ Hits        27844    27850    +6     
  Misses       1267     1267

Files with missing lines	Coverage Δ
pgmpy/datasets/_base.py	`94.28% <100.00%> (+0.03%)`	⬆️
pgmpy/example_models/_base.py	`95.58% <100.00%> (+0.35%)`	⬆️
...y/tests/test_example_models/test_example_models.py	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

Copilot AI review requested due to automatic review settings March 17, 2026 17:13

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Copilot started reviewing on behalf of dhairya-pandya March 17, 2026 17:20 View session

feat: use __doc__ as dataset description and remove custom parsing logic

2d652fe

dhairya-pandya force-pushed the reference-dataset-description branch from b2ecbbc to 2d652fe Compare March 17, 2026 19:04

ankurankan marked this pull request as draft March 17, 2026 19:52

Merge branch 'dev' into reference-dataset-description

adad49e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use doc as dataset description#3021

feat: use doc as dataset description#3021
dhairya-pandya wants to merge 2 commits intopgmpy:devfrom
dhairya-pandya:reference-dataset-description

dhairya-pandya commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

ankurankan commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

dhairya-pandya commented Mar 17, 2026

The following checklist is mandatory.

Your checklist for this pull request

Issue number(s) that this pull request fixes

List of changes to the codebase in this pull request

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

ankurankan commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants