Skip to content

Conversation

@k-dominik
Copy link
Contributor

@k-dominik k-dominik commented Oct 18, 2024

Inputs such as label images from cellpose or stardist are treated a bit poorly in ilastik - connected components is run again (with possibly different connectivity) resulting in different objects in ilastik after loading such label image data. This is super inconvenient for users: Any prior analysis they did will be hard to merge (different object ids, possibly different objects).

This PR introduces a new workflow, "Object Classification [Raw Data, Label Image]" that internally doesn't do CCA anymore but merely relabels the label image to ensure consecutive labels (required by feature extraction in vigra).

see discussion in #2416

fixes #2416

Summary:

  • rewrite of OpRelabelConsecutive into an operator that caches two results: the relabeled image and a dictionary that holds the mapping - both necessary in the workflow (relabeld image as input for feature computation, the mapping to export the original labelids to the table).
  • Serialization logic for this slot that ensures valid saving of both data to the ilp file
  • Added the new column to the export logic
  • Added the new Object Classification from Label images workflow

There are some slightly unrelated changes (first two commits):

  • Binary image layer and object id image have been treated as different sources when displaying, which is not really necessary and can increase cache usage (as two images will be accessed instead of one). This is changed by sourcing the binary layer also from the label image (which has to be computed anyhow for all three flavors of object classification).
  • Removed the Binary Image input completely from the Object Classification op: it was only used for visualization.
  • I renamed all usages of Binary image to Segmentation image as we used binary segmentations as well as instance segmentations as inputs there.
  • Needed to add another pin for pytorch - really need to investigate this: Investigate problems with pytorch on windows for 2.4.0, 2.4.1, 2.5.0 #2922

Changes I didn't include here because changeset is already pretty big:

  • change the order of columns in export - OriginalObjectID column should be easier to find (and fix order to some sorting)
  • Allow for export without an image: Currently ilastik forces exporting an image to export the table. If the Label Image already exists outside, and the table correctly links to it, it is really not necessary anymore to export any image.

Todos (before merge, didn't do it yet for the sake of the review):

  • remove ac9b4d5, has been fixed on main
  • rebase/fixup commits as indicated by commit messages

label image _has_ to be cached/accessed always, binary image not really.
Binary display can be achieved with a two color colormap.

As a consequence this should be friendlier for caching.

Also, display does not rely anymore on a binary image being present,
which could not be the case for label-type inputs.
throughout the code we were referring to the segmentation input as
binary, although both, label images and binary segmentations are
supported.

Also connected Binary Layer Display to Segmentationlayer display with
an optimized colormap
@k-dominik k-dominik changed the title Add oc for label2 Add object classification for label images Oct 18, 2024
@k-dominik k-dominik force-pushed the add-oc-for-label2 branch 2 times, most recently from 91f3f91 to 5f00632 Compare October 18, 2024 16:50
@codecov
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 77.79851% with 119 lines in your changes missing coverage. Please review.

Project coverage is 56.58%. Comparing base (dfef849) to head (3b8cdc8).
Report is 173 commits behind head on main.

Files with missing lines Patch % Lines
...ik/applets/base/appletSerializer/slotSerializer.py 61.16% 33 Missing and 7 partials ⚠️
...tik/applets/objectExtraction/opObjectExtraction.py 64.47% 24 Missing and 3 partials ⚠️
lazyflow/operators/opRelabelConsecutive.py 91.56% 7 Missing and 14 partials ⚠️
...jectClassification/objectClassificationWorkflow.py 70.37% 8 Missing ⚠️
...ts/objectClassification/objectClassificationGui.py 25.00% 5 Missing and 1 partial ⚠️
...ets/objectExtraction/objectExtractionSerializer.py 66.66% 6 Missing ⚠️
...Classification/blockwiseObjectClassificationGui.py 0.00% 3 Missing ⚠️
...applets/objectExtraction/objectExtractionApplet.py 81.25% 3 Missing ⚠️
...ik/applets/objectExtraction/objectExtractionGui.py 57.14% 2 Missing and 1 partial ⚠️
...tClassification/opBlockwiseObjectClassification.py 93.75% 0 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2921      +/-   ##
==========================================
+ Coverage   56.47%   56.58%   +0.10%     
==========================================
  Files         535      535              
  Lines       62263    62677     +414     
  Branches     7711     7763      +52     
==========================================
+ Hits        35165    35464     +299     
- Misses      25335    25432      +97     
- Partials     1763     1781      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k-dominik k-dominik force-pushed the add-oc-for-label2 branch 2 times, most recently from 9fbbdf3 to 5985f92 Compare October 19, 2024 08:46
Rewritten OpRelabelConsecutive with a Cache compatible interface.
OpRelabelConsecutive5D caches the relabeled array and a mapping dict
in sync.

Motivation is to support properly label images in object classification
where we have to relabel consecutively for vigra feature computation,
but also export the original labels in the table.
the `OpRelabelConsecutive` op caches it's output.
in order to _properly_ support label images as inputs in object
classification, we want to treat the original labeling more respectfully
(don't mess with connectivity) and also add the original object id to
Object Features in order to make it available in the table for export.
@k-dominik k-dominik marked this pull request as ready for review October 21, 2024 04:45
@k-dominik k-dominik requested a review from btbest October 21, 2024 04:45
@k-dominik
Copy link
Contributor Author

putting this back to draft - I must have messed something up when rebasing, the new workflow currently doesn't work 🙄

@k-dominik k-dominik marked this pull request as draft October 22, 2024 11:53
Copy link
Contributor

@btbest btbest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously a great feature to have and I can see the pain that the existing interfaces caused expressed in code. I think it's squeezed into a reasonable change set though

Review could have been quicker if the commits documented copy-pasting (bonus points for separating copying commit from subsequent modifications); as far as I could tell the SerialRelabeledDataSlot is a modified SerialBlockSlot and OpRelabelConsecutive5D is obviously mostly a modified copy-paste of its base class OpUnblockedArrayCache

There's a leftover BinaryImage in manualTrackingWorkflow#114 that completely breaks the workflow, the other comments are all just suggestions

SerializationOutput = OutputSlot()

# total dummy slot, there because OpObjectExtraction expects this interface
BypassModeEnabled = InputSlot(value=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually necessary - if only there was a test we could run to check... 😁

(OpRelabelConsecutive.BypassModeEnabled is never used, and the workflow ran fine manually after deleting it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least with this particular operator we should also cache in headless mode. So will remove bypassmodeenabled if possible without adding a lot of special checks downstream.

Copy link
Contributor

@btbest btbest Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic dependency tracking here was too hard to understand in what sense "OpObjectExtraction expects this interface"; now I get it.

I think in this context I would keep it on OpRelabelConsecutive to document the dependency (and the fact that OpRelabelConsecutive is actually an "OpLabelVolume", even if they share no base class or naming), but remove it from OpRelabelConsecutive5D so that at least that op can focus on its job

pass


class OpRelabelConsecutive5D(OpUnblockedArrayCache):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really see the value of nesting the cached op here, seems to me one could just put the dtype functionality in the caching op and remove the wrapper? Then the caching op also wouldn't have to double check the dtype on its own side again...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean by nesting here. You mean Having OpRelabelConsecutive5D as an object being used in OpRelabelConsecutive?

The latter operates in terms of input data dimensions, and OpRelabelConsecutive5D implementation is simplified by assuming 5D...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see OpRelabelConsecutive mirrors the internal architecture of OpLabelVolume pretty much 1:1, though it does read a lot cleaner here :)

This implicit relationship has tripped me up a few times during the review... I feel it could be helpful to have a noop OpObjectsABC or sth purely to document the connection, and/or rename OpRelabelConsecutive into OpRelabelVolume. I'm immediately also jumping to "well and extract the reorder-dtype-convert-MakeLabels-reorderToInput" cascade into the base class"... Or maybe OpRelabelConsecutive5D should really be a subclass of OpLabelingABC. But I remember you've mentioned going back and forth between implementations and discarding others, so I'm happy to accept what we have here was the cleanest way to solve this.

Copy link
Contributor Author

@k-dominik k-dominik Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was good you left the comment as the common interface between OpRelabelConsecutive and OpLabelVolume for object classification is pretty minimal as it turns out:

class OpLabelBase(Operator):
    """Minimal interface for Operators that takes an input volume and produces a
    consecutive labeling as an output volume"""

    Input = InputSlot()
    """Volume to (re)label, up to 5D"""

    CachedOutput = OutputSlot()
    """Cached label image - internally always whole time slices are stored
    Axistags and shape are the same as on the Input, dtype is an integer
    datatype.
    This slot extends the ROI to the full xyz volume (c and t are unaffected)
    and computes the labeling for the whole volume. As long as the input does
    not get dirty, subsequent requests to this slot guarantee consistent
    labelings.
    """

    CleanBlocks = OutputSlot()
    """Expose for Cache serialization, slicings for clean blocks stored in the cache"""

will probably do the job (still shifting things around a bit). It's a pity that not more of the interface can be changed, but well...



@patch("vigra.analysis.relabelConsecutive", wraps=vigra.analysis.relabelConsecutive)
def test_time_slice_blocking_dict(relabel_mock: MagicMock, oprelabel, labels_t):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this up to line 174 (just so that the time_slice_blocking tests are together)

def test_startlabel(self):
op = OpRelabelConsecutive(graph=Graph())
op.StartLabel.setValue(10)
freed = cache_op.freeBlock(((5, 5), (6, 6)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this test ensure? I couldn't find a roi here that failed the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just ensures that there will be noerrors clearing cache keys. It's a feature all the caches implement and this test documents that it is working as expected.

def test_simple(oprelabel: OpRelabelConsecutive, labels: vigra.VigraArray):
oprelabel.Input.setValue(labels)
expected = labels // 2
relabeled: npt.ArrayLike = oprelabel.CachedOutput[:].wait()
Copy link
Contributor

@btbest btbest Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
relabeled: npt.ArrayLike = oprelabel.CachedOutput[:].wait()
relabeled: npt.NDArray = oprelabel.CachedOutput[:].wait()

don't have the connection to check now but I remember ArrayLike being deceptive; sounds like "something that looks like an array" but specifically means "something that implements __array__"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, total oversight :)

k-dominik and others added 2 commits November 29, 2024 15:26
* removed unused / cleaned-up imports
* removed `_standarize_roi` from `OpRelabelConsecutive5D` - implementation
  identical to base class `OpUnblockedArrayCache`
* Fix Manual tracking: BinaryImage -> SegmentationImage
* removed a redundant return

Co-authored-by: Benedikt Best <[email protected]>
Clash on Win - pyshtools installs openmp, which clobbers dlls needed by pytorch's intel-openmp since pytorch 2.4

See bfab86d (similar problem)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable pre-indexed superpixels as input for object classification

2 participants