Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use label id as segment numbers #485

Merged
merged 10 commits into from
Dec 18, 2023

Conversation

michaelonken
Copy link
Member

@michaelonken michaelonken commented Dec 8, 2023

This pull request contains two parts:

New functionality:

  • In ITK to DICOM conversion, use Label IDs from input files as Segment Numbers if possible. From the related API documentation:
     * @param sortByLabelID A boolean indicating whether to sort the segments by label ID.
     *        In this case, the label IDs are used as DICOM segment numbers. This only works
     *        if the label IDs start at 1 and are numbered monotonically without gaps. The processing
     *        order of label IDs is not relevant, i.e. they can occur in any order in the input.
     *        If n labels are not assigned uniquely to label IDs 1..n in the input, the
     *        conversion will fail.
     *        If this is set to false (default), the segment numbers are assigned in the order of the
     *        labels that are being converted, i.e. the first label will receive the Segment
     *        Number 1, the second label will receive the Segment Number 2, etc.

Also, a new test group is added that uses some (more than before) sophisticated segment setup with overlapping segments. A first test produces a DICOM file from three NRRD inputs, which are merged together in one DICOM file. A second test splits them again into 3 NRRD files (where 2 NRRD files will contain more than one segment) and compares them to the original inputs from test 1. A third test compares metadata input used in test 1 to metadata output produced by test 2.

Various small enhancements in ITK to DICOM conversion context:

  • Replaced some deprecated JSON writer code
  • Enhanced documentation
  • Made logging output a bit more consistent
  • Save some memory (don't copy segmentation dataset on return)

michaelonken and others added 8 commits December 7, 2023 11:21
Make sure input data has been  initialized  by user before public
methods are called.

Made logging message more consistent.

Made code more readable.
When converting from ITK to DICOM, allow to use Label IDs as Segment
Numbers so that even if reading from various NRRD inputs, the Segments
in the DICOM file will still use the Label IDs from the NRRD files.

This is particularly helpful for testing since it allows a roundtrip test
from DICOM -> multiple NRRD files using potentially multiple segments
each -> back to DICOM. If no mapping to the Label IDs takes place, the
converter will use ascending Segment Numbers, so the order will depend
on the read order, which again usually depends on the order in the JSON
meta information.

This sorting behavior is disabled by default. It is available as
cli option (--sortByLabelID) which is handed to the library through the
Itk2DicomConverter::itkimage2dcmSegmentation() call.
Added missing test dependency.
Copy link
Member

@fedorov fedorov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent quite some time with this PR, and I have a lot of questions. I am not sure it will work.

Also, note that MultiFrameDimensionModule utilizes ReferencedSegmentNumber in the per-frame FGs, see: https://github.com/QIICR/dcmqi/blob/master/libsrc/Itk2DicomConverter.cpp#L58-L62.

I would think that all of the frames would need to be reordered in the general case if the segments are renumbered.

Let me know if it is easier to have a screen share to go over this!

@@ -64,6 +64,15 @@
<description>Skip empty slices while encoding segmentation image. By default, empty slices will not be encoded, resulting in a smaller output file size.</description>
</boolean>

<boolean>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelonken do we really need this flag? I do not see any downsides in always sorting label IDs and always reusing those if possible in SEG. Since you already implemented this option, why not keep it but set the default to true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this approach for two reasons:

  1. Don't break backward compatibility
  2. Make sure that conversion succeeds also in cases where label IDs cannot be sorted (since then the call will fail). I assume the user does often not know beforehand, whether the labelmaps will allow sorting. So the idea is that you only turn on this functionality if the user knows the prerequisites for this are fulfilled.

If 1. does not matter, one could also enable it by default and fall back to the old behavior if re-sorting does not work. I am not sure whether this can be implemented well, since the "rollback" after half of the re-resorting is probably difficult. And finding this out before actually starting the sorting work costs extra time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! I made this comment in the beginning of my review, and if I revisited it after the review was done, I would most definitely revert it!

@@ -367,16 +372,16 @@ namespace dcmqi {
CHECK_COND(fgder->addDerivationImageItem(CodeSequenceMacro(code_seg.CodeValue,code_seg.CodingSchemeDesignator,
code_seg.CodeMeaning),"",derimgItem));

//cout << "Total of " << siVector.size() << " source image items will be added" << endl;
// cout << "Total of " << siVector.size() << " source image items will be added" << endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you either remove this and the following commented out line, or swap for a logging command? I know this is most likely my code, not yours ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure. There are several places IIRC where I was not sure what to do with old "debug-like" commands. If you say I can do what I find useful, I will over all log messages and try to bring them into good shape.

@@ -285,6 +291,7 @@ namespace dcmqi {

Uint16 segmentNumber;
CHECK_COND(segdoc->addSegment(segment, segmentNumber /* returns logical segment number */));
segNum2Label.insert(make_pair(segmentNumber, label));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, different values of segmentNumber can map to the same value of label, which may not be unique across input segmentation files. Is this ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looking further, this will definitely be a problem, since this map is later used for renumbering segments!

Copy link
Member

@fedorov fedorov Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an easy thing to do is to check if label value is already present in the map prior to inserting it, and if it is indeed present, reset sortByLabelID to false, since sorting becomes impossible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that labels are unique in the input. If this assumption does not hold, then indeed the sorting does not make sense (since the idea to use the original labels as Segment Numbers cannot work then if two labels use the same number)


if (sortByLabelID)
{
sortByLabel(segdocDataset.get(), segNum2Label);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how this could possibly work. I do not see any checks for non-consecutive segment numbering, so if there are gaps in label numbers, the result will violate the conventions for having consecutive values for SegmentNumber. Did I miss something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, this is not addressed. Probably it can easily by done by either checking the labels in in segNum2Label, or iterating over the segments in Segment Sequence once mapping has been performed.

@michaelonken
Copy link
Member Author

michaelonken commented Dec 14, 2023 via email

@michaelonken
Copy link
Member Author

michaelonken commented Dec 15, 2023

Also, note that MultiFrameDimensionModule utilizes ReferencedSegmentNumber in the per-frame FGs, see: https://github.com/QIICR/dcmqi/blob/master/libsrc/Itk2DicomConverter.cpp#L58-L62.

Good catch, i missed the dimension assignment. The MultiFrameDimensionModule can stay like it is (first dimension Referenced Segment Number, second Image Position Patient) but the dimension assignment in the FrameContentMacro has to be updated as well.

After thinking about it, I assume it would be enough to also update the (Referenced) Segment Number used as 1st dimension value in Dimension Index Value in FrameContentMacro. The selection of frames and their order (denoted by second dimension) for the display do not need to change if we just change the Segment Numbers.

Comment 2023-12-18: I gave it some more thought and we could even keep it as is. The number of frames for display must not change at all if we re-number the segments (only). The display would not start any more with the old (unmapped) Segment 1 but with whatever Segment this has been mapped to, but this is not invalid or wrong. Of course one can adapt the order to start with new (mapped) 1 by applying the mapping to the Dimension Index Values as well in a second step.

@fedorov
Copy link
Member

fedorov commented Dec 15, 2023

I assumed that labels are unique in the input. If this assumption does not hold, then indeed the sorting does not make sense (since the idea to use the original labels as Segment Numbers cannot work then if two labels use the same number)

Aha! That's the key. Yes, I do not think we can assume anything about the label uniqueness. So what do we do with this PR?

@@ -64,6 +64,15 @@
<description>Skip empty slices while encoding segmentation image. By default, empty slices will not be encoded, resulting in a smaller output file size.</description>
</boolean>

<boolean>
<name>sortByLabelID</name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this as useLabelIDasSegmentNumber?

@michaelonken
Copy link
Member Author

I assumed that labels are unique in the input. If this assumption does not hold, then indeed the sorting does not make sense (since the idea to use the original labels as Segment Numbers cannot work then if two labels use the same number)

Aha! That's the key. Yes, I do not think we can assume anything about the label uniqueness. So what do we do with this PR?

As discussed (offline), I am pretty sure I know how to fix this. But we will make the flag not the default behavior for now.

Renamed feature from sortByLabel to useLabelIDAsSegmentNumber.
Added check for unique, monotonically increasing label IDs before
starting reassignment.
Added missing update of ReferencedFrameNumber.
@fedorov fedorov merged commit a7cb0f7 into QIICR:master Dec 18, 2023
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants