Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] add the option to provide path to labels tsv #4767

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

mtorabi59
Copy link
Contributor

Changes proposed in this pull request:

  • this PR will add the option to provide a path to a tsv file as a lookup table for label names, according to BIDS dseg.tsv format, instead of a list. This will make sure the ids and names correspond to each other, and also the background will be better recognized.

Copy link
Contributor

👋 @mtorabi59 Thanks for creating a PR!

Until this PR is ready for review, you can include the [WIP] tag in its title, or leave it as a github draft.

Please make sure it is compliant with our contributing guidelines. In particular, be sure it checks the boxes listed below.

  • PR has an interpretable title.
  • PR links to Github issue with mention Closes #XXXX (see our documentation on PR structure)
  • Code is PEP8-compliant (see our documentation on coding style)
  • Changelog or what's new entry in doc/changes/latest.rst (see our documentation on PR structure)

For new features:

  • There is at least one unit test per new function / class (see our documentation on testing)
  • The new feature is demoed in at least one relevant example.

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

We will review it as quick as possible, feel free to ping us with questions if needed.

@mtorabi59
Copy link
Contributor Author

mtorabi59 commented Nov 19, 2024

You can use the following code to test the new option:

    import csv
    from nilearn._utils.data_gen import (
        generate_labeled_regions,
        generate_random_img,
    )
    from nilearn.maskers import NiftiLabelsMasker
    import numpy as np 
    
    labels_as_path = True 
    
    data = [
        {'region id': 0, 'region name': 'Background'},
        {'region id': 1, 'region name': 'Region A'},
        {'region id': 2, 'region name': 'Region B'},
        {'region id': 3, 'region name': 'Region C'},
        {'region id': 4, 'region name': 'Region D'},
        {'region id': 5, 'region name': 'Region E'},
        {'region id': 6, 'region name': 'Region F'},
        {'region id': 7, 'region name': 'Region G'},
        {'region id': 8, 'region name': 'Region H'},
        {'region id': 9, 'region name': 'Region I'}
    ]
    
    with open('./regions.tsv', 'w', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=['region id', 'region name'], delimiter='\t')
        writer.writeheader()
        writer.writerows(data)
    
    if labels_as_path:
        labels = './regions.tsv'
    else:
        labels = [
            'Background', 'Region A', 'Region B', 'Region C', 'Region D',
            'Region E', 'Region F', 'Region G', 'Region H', 'Region I'
        ]
    
    n_regions = 9
    length = 93
    shape_3d_default = (7, 8, 9)
    affine_eye = np.eye(4)
    
    shape1 = (*shape_3d_default, length)
    
    fmri_img, mask11_img = generate_random_img(
        shape1,
        affine=affine_eye,
    )
    
    labels_img = generate_labeled_regions(
        shape1[:3],
        affine=affine_eye,
        n_regions=n_regions,
    )
    
    masker = NiftiLabelsMasker(labels_img, labels=labels, resampling_target=None)
    signals = masker.fit().transform(fmri_img)
    
    print(masker.labels)
    print(masker._region_id_name)

@Remi-Gau
Copy link
Collaborator

@mtorabi59
let me know when you want me to have a first look at this.

Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.57%. Comparing base (abb80ff) to head (6e12f7b).
Report is 313 commits behind head on main.

Files with missing lines Patch % Lines
nilearn/maskers/nifti_labels_masker.py 81.81% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4767      +/-   ##
==========================================
+ Coverage   91.85%   93.57%   +1.72%     
==========================================
  Files         144      135       -9     
  Lines       16419    17148     +729     
  Branches     3434     2967     -467     
==========================================
+ Hits        15082    16047     +965     
+ Misses        792      586     -206     
+ Partials      545      515      -30     
Flag Coverage Δ
macos-latest_3.10_test_plotting 93.09% <81.81%> (?)
macos-latest_3.11_test_plotting 93.09% <81.81%> (+1.23%) ⬆️
macos-latest_3.12_test_plotting 93.09% <81.81%> (?)
macos-latest_3.13_test_plotting 93.09% <81.81%> (?)
macos-latest_3.9_test_plotting 93.07% <81.81%> (?)
ubuntu-latest_3.10_test_plotting 93.09% <81.81%> (+1.23%) ⬆️
ubuntu-latest_3.11_test_plotting 93.09% <81.81%> (?)
ubuntu-latest_3.12_test_plotting 93.09% <81.81%> (?)
ubuntu-latest_3.13_test_plotting 93.09% <81.81%> (?)
ubuntu-latest_3.13_test_pre 93.11% <81.81%> (?)
ubuntu-latest_3.9_test_min 69.13% <72.72%> (?)
ubuntu-latest_3.9_test_plot_min 91.98% <81.81%> (?)
ubuntu-latest_3.9_test_plotting 93.07% <81.81%> (?)
windows-latest_3.10_test_plotting 93.07% <81.81%> (?)
windows-latest_3.11_test_plotting 93.07% <81.81%> (?)
windows-latest_3.12_test_plotting 93.07% <81.81%> (?)
windows-latest_3.13_test_plotting 93.07% <81.81%> (?)
windows-latest_3.9_test_plotting 93.05% <81.81%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtorabi59 mtorabi59 marked this pull request as ready for review November 20, 2024 23:11
@mtorabi59 mtorabi59 requested a review from Remi-Gau November 20, 2024 23:11
Copy link
Collaborator

@Remi-Gau Remi-Gau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass review.
Let me know if something is unclear.

Also I would suggest starting adding test now because it will help us see what makes sense or not.

self._region_id_name = None

# if the labels is a path, it is the path to tsv file, so read it
if isinstance(labels, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic comment: we are going to try to better follow the scikit learn guidelines on how to design 'estimators'. So that means that

There should be no logic, not even input validation, and the parameters should not be changed; which also means ideally they should not be mutable objects such as lists or dictionaries. If they’re mutable, they should be copied before being modified. The corresponding logic should be put where the parameters are used, typically in fit.

https://scikit-learn.org/dev/developers/develop.html#instantiation

So I think we should move all of this to the fit method.

try:
region_id_name = pd.read_csv(labels, sep="\t")
# Note that labels should include the background too
labels = region_id_name["region name"].tolist()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to have one of the attribute be directly a pandas dataframe instead of assigning columns to an attribute.

the only check should be that the dataframe has at least the columns named:

  • index
  • name

These are the only 2 required by BIDS (optional ones are abbreviation, color, mapping).

See example of such tsv:

https://github.com/bids-standard/bids-examples/blob/master/ds000001-fmriprep/desc-aparcaseg_dseg.tsv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought working with self._region_id_name as a dict would be easier. But if you have a strong preference we can make it a pandas dataframe instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a dataframe will automatically force us or the user to keep a mapping between a region label and its name: dictionaries will be more brittle for that purpose

self._region_id_name = None

# if the labels is a path, it is the path to tsv file, so read it
if isinstance(labels, str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(labels, str):
if isinstance(labels, (str, pathlib.Path)):

or a path 😉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

  • update doc string

if region_id != self.background_label
]
if self._region_id_name is not None:
# check if initial_region_ids all exist in self._region_id_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I would do 2 checks:

  • see if we have regions in the tsv that do not exist in the image
  • see if we have regions in the image that do not exist in the tsv

I wonder if in the long term we could just throw warnings for those an not errors.

Comment on lines 593 to 597
missing_ids = [
region_id
for region_id in np.unique(
_utils.niimg.safe_get_data(self.labels_img_)
)
if region_id != self.background_label
for region_id in initial_region_ids
if region_id not in self._region_id_name
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_ids = set(initial_region_ids) - set(self._region_id_name)

maybe using set operation is faster and clearer, no?


# if the labels is a path, it is the path to tsv file, so read it
if isinstance(labels, str):
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would not go for a try / except here but maybe there is a reason I am missing.
was it to provide more meaningful error message than pandas' "file not found" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because instead of throwing an error and stopping the code, I thought maybe it's better to throw a warning and set self.labels to None.

@Remi-Gau
Copy link
Collaborator

Another thing to do that will help us get a feel for this would be to update the fetcher of one of our atlases so it also creates the type of TSV we would expect users to provide.

Copy link
Member

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Remi-Gau Remi-Gau marked this pull request as draft November 29, 2024 13:37
@Remi-Gau Remi-Gau changed the title add the option to provide path to labels tsv [ENH] add the option to provide path to labels tsv Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pass labels for label maskers as path to a look up table mapping a region label and its name
3 participants