-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] add the option to provide path to labels tsv #4767
base: main
Are you sure you want to change the base?
Conversation
👋 @mtorabi59 Thanks for creating a PR! Until this PR is ready for review, you can include the [WIP] tag in its title, or leave it as a github draft. Please make sure it is compliant with our contributing guidelines. In particular, be sure it checks the boxes listed below.
For new features:
For bug fixes:
We will review it as quick as possible, feel free to ping us with questions if needed. |
…b.com:mtorabi59/nilearn into labels_path_branch
You can use the following code to test the new option: import csv
from nilearn._utils.data_gen import (
generate_labeled_regions,
generate_random_img,
)
from nilearn.maskers import NiftiLabelsMasker
import numpy as np
labels_as_path = True
data = [
{'region id': 0, 'region name': 'Background'},
{'region id': 1, 'region name': 'Region A'},
{'region id': 2, 'region name': 'Region B'},
{'region id': 3, 'region name': 'Region C'},
{'region id': 4, 'region name': 'Region D'},
{'region id': 5, 'region name': 'Region E'},
{'region id': 6, 'region name': 'Region F'},
{'region id': 7, 'region name': 'Region G'},
{'region id': 8, 'region name': 'Region H'},
{'region id': 9, 'region name': 'Region I'}
]
with open('./regions.tsv', 'w', newline='') as file:
writer = csv.DictWriter(file, fieldnames=['region id', 'region name'], delimiter='\t')
writer.writeheader()
writer.writerows(data)
if labels_as_path:
labels = './regions.tsv'
else:
labels = [
'Background', 'Region A', 'Region B', 'Region C', 'Region D',
'Region E', 'Region F', 'Region G', 'Region H', 'Region I'
]
n_regions = 9
length = 93
shape_3d_default = (7, 8, 9)
affine_eye = np.eye(4)
shape1 = (*shape_3d_default, length)
fmri_img, mask11_img = generate_random_img(
shape1,
affine=affine_eye,
)
labels_img = generate_labeled_regions(
shape1[:3],
affine=affine_eye,
n_regions=n_regions,
)
masker = NiftiLabelsMasker(labels_img, labels=labels, resampling_target=None)
signals = masker.fit().transform(fmri_img)
print(masker.labels)
print(masker._region_id_name) |
@mtorabi59 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4767 +/- ##
==========================================
+ Coverage 91.85% 93.57% +1.72%
==========================================
Files 144 135 -9
Lines 16419 17148 +729
Branches 3434 2967 -467
==========================================
+ Hits 15082 16047 +965
+ Misses 792 586 -206
+ Partials 545 515 -30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
for more information, see https://pre-commit.ci
…o labels_path_branch
for more information, see https://pre-commit.ci
…o labels_path_branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass review.
Let me know if something is unclear.
Also I would suggest starting adding test now because it will help us see what makes sense or not.
self._region_id_name = None | ||
|
||
# if the labels is a path, it is the path to tsv file, so read it | ||
if isinstance(labels, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic comment: we are going to try to better follow the scikit learn guidelines on how to design 'estimators'. So that means that
There should be no logic, not even input validation, and the parameters should not be changed; which also means ideally they should not be mutable objects such as lists or dictionaries. If they’re mutable, they should be copied before being modified. The corresponding logic should be put where the parameters are used, typically in fit.
https://scikit-learn.org/dev/developers/develop.html#instantiation
So I think we should move all of this to the fit method.
try: | ||
region_id_name = pd.read_csv(labels, sep="\t") | ||
# Note that labels should include the background too | ||
labels = region_id_name["region name"].tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to have one of the attribute be directly a pandas dataframe instead of assigning columns to an attribute.
the only check should be that the dataframe has at least the columns named:
- index
- name
These are the only 2 required by BIDS (optional ones are abbreviation, color, mapping).
See example of such tsv:
https://github.com/bids-standard/bids-examples/blob/master/ds000001-fmriprep/desc-aparcaseg_dseg.tsv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought working with self._region_id_name
as a dict would be easier. But if you have a strong preference we can make it a pandas dataframe instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a dataframe will automatically force us or the user to keep a mapping between a region label and its name: dictionaries will be more brittle for that purpose
self._region_id_name = None | ||
|
||
# if the labels is a path, it is the path to tsv file, so read it | ||
if isinstance(labels, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(labels, str): | |
if isinstance(labels, (str, pathlib.Path)): |
or a path 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
- update doc string
if region_id != self.background_label | ||
] | ||
if self._region_id_name is not None: | ||
# check if initial_region_ids all exist in self._region_id_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I would do 2 checks:
- see if we have regions in the tsv that do not exist in the image
- see if we have regions in the image that do not exist in the tsv
I wonder if in the long term we could just throw warnings for those an not errors.
missing_ids = [ | ||
region_id | ||
for region_id in np.unique( | ||
_utils.niimg.safe_get_data(self.labels_img_) | ||
) | ||
if region_id != self.background_label | ||
for region_id in initial_region_ids | ||
if region_id not in self._region_id_name | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing_ids = set(initial_region_ids) - set(self._region_id_name)
maybe using set operation is faster and clearer, no?
|
||
# if the labels is a path, it is the path to tsv file, so read it | ||
if isinstance(labels, str): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would not go for a try / except
here but maybe there is a reason I am missing.
was it to provide more meaningful error message than pandas' "file not found" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because instead of throwing an error and stopping the code, I thought maybe it's better to throw a warning and set self.labels
to None
.
Another thing to do that will help us get a feel for this would be to update the fetcher of one of our atlases so it also creates the type of TSV we would expect users to provide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes proposed in this pull request: