Skip to content

Conversation

@k-dominik
Copy link
Contributor

@k-dominik k-dominik commented Feb 16, 2021

In order to be more useful for larger data, this PR adds a roi parameter to the predict method of the pixel classification api.
Right now I went for xarray dict-like specification, which looks very sensible.

Assumption: Our pipelines will always handle halo's internally (given they have access to the raw data).

Example:

import h5py
import xarray
from ilastik.experimental.api import from_project_file

pipeline = from_project_file("my_pretrained_project.ilp")
f = h5py.File("my-big-image.h5", "r")
ds = xarray(f["data"], dims="zyx")
block_prediction = pipeline.predict(ds, {"x": slice(0, 100), "y": slice(0, 200)})  # all "z"

cc @m-novikov @emilmelnikov @Tomaz-Vieira

@k-dominik k-dominik changed the title expoerimental/API added roi parameter to pixel classification pipeline experimental/API added roi parameter to pixel classification pipeline Feb 16, 2021
self._predict_op.LabelsCount.setValue(classifer.label_count)

def predict(self, data):
def predict(self, data, xarray_roi=dict()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def predict(self, data, xarray_roi=None):
    if xarray_roi is None:
        xarray_roi = {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the objection against the default value?

Copy link
Member

@emilmelnikov emilmelnikov Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because default values for function arguments are evaluated and assigned on function definition, not evaluation. This doesn't play well with mutable objects: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments.

Copy link
Contributor Author

@k-dominik k-dominik Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crap it does...gee, what a mistake...



def xarray_roi_to_slicing(xarray_roi, vigra_axistags):
if any(k not in vigra_axistags for k in xarray_roi.keys()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if any(k not in vigra_axistags for k in xarray_roi.keys()):
if set(xarray_roi).difference(vigra_axistags):

Comment on lines +75 to +81
out_slicing = []
for axis in vigra_axistags:
if axis in xarray_roi:
out_slicing.append(xarray_roi[axis])
else:
out_slicing.append(slice(None, None, None))
return tuple(out_slicing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out_slicing = []
for axis in vigra_axistags:
if axis in xarray_roi:
out_slicing.append(xarray_roi[axis])
else:
out_slicing.append(slice(None, None, None))
return tuple(out_slicing)
return tuple(xarray_roi.get(axis, slice(None)) for axis in vigra_axistags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers, I was actually hoping to get this from xarrray and didn't look into their docs yet... :) But I'll use your suggestion if there is nothing to be found.

def xarray_roi_to_slicing(xarray_roi, vigra_axistags):
if any(k not in vigra_axistags for k in xarray_roi.keys()):
raise ValueError(f"xarray_roi contains keys ({xarray_roi.keys()} not in axistags ({vigra_axistags})")
if not xarray_roi:
Copy link
Member

@emilmelnikov emilmelnikov Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this special case? That is, does a[...] == a[:, :, :, :, :] always hold in volumina?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on the data, unfortunately. I wanted to send a clear message for what happens, when no roi is provided.

@Tomaz-Vieira
Copy link
Contributor

Some objections against dictionaries are:

  • the set of keys is not obvious (nor is their case sensitiveness);
  • The meaning of missing keys might also not be obvious;
  • Extra or mistyped keys are not immediately detected (if at all);
  • Static type checking can't catch any errors in the dict;

@m-novikov
Copy link
Contributor

Another idea to return intermediate objects when we call predict and only materialize it on direct query.

block_prediction = pipeline.predict(ds).isel(x=slice(0,10), y=slice(0, 17)).result()
# this kind of block interaction is nicer to read imo
predictor = pipeline.Predictor(ds)
computed = []
for block in tile(ds.shape, 256):
     tile_prediction = predictor.isel(**block).result()
     computed.append(tile_prediction)

it would also play nicer if we have multi-value returns

@k-dominik
Copy link
Contributor Author

Some objections against dictionaries are:

* the set of keys is not obvious (nor is their case sensitiveness);

* The meaning of missing keys might also not be obvious;

* Extra or mistyped keys are not immediately detected (if at all);

* Static type checking can't catch any errors in the dict;

I agree, there are some nice pro's tho:

  • builtin type, anyone knows how to interact with them (I would say this is the strongest argument for dicts, also that xarray implements indexing with dicts is a plus).
  • once you're in a notebook, no static type checking will help you (well, there might be tooling though that I don't know, can you mypy live in notebooks these days?)
  • There is TypedDict (we'd have to upgrade Python though, or add yet another dependency)

Another idea to return intermediate objects when we call predict and only materialize it on direct query.

What I liked about the first iteration of the api is that was a single call to get a prediction. That is staying with the original predict interface. I could imagine something like this as part of a blockwise_predict function? With this isel method, that would return a future...

@Tomaz-Vieira
Copy link
Contributor

Tomaz-Vieira commented Feb 23, 2021

builtin type, anyone knows how to interact with them

this is only half-true, though. dict is indeed a builtin, but we don't just take ANY builtin dict; We take a very particular type of dict, which the user will have to learn to build. And he can do that either by reading English ("roi is a Dict[str, Tuple[in, int]] where the keys are in "xyztc" "), or by reading python:

def __init__(self, x: Tuple[int, int], y: Tuple[int, int], z: Tuple[int, int], c: Tuple[int, int], t: Tuple[int, int])

which comes with the added boni that the second version can have explicit default values and does not silently accept bad keys

once you're in a notebook, no static type checking will help you

True that, but basic python functionality will still work. This produces no errors at the call site:

predict(roi={"X": (100, 200), "y": (300, 400)})

While this:

predict(roi=CustomRoi(X=(100, 200), y=(300,400)))

immediately throws a "TypeError: Got an unexpected keyword argument: X" , right at the call site.

There is TypedDict

I guess that could also work; the syntax is indeed more ergonomic at the cost of runtime validation. But if ever we need to add some validation in the constructor or add some methods to the rois (enlarge, split, get_tiles, etc), we're back to a custom Roi class

@k-dominik
Copy link
Contributor Author

How about we hide that from users and offer mypy compatibility:

RoiDict = TypedDict('RoiDict', {"x": ..., "y", ....}, total=False)

class Classifier:
    ...
    def predict(roi: RoiDict):
        croi = CustomRoi(**roi)

Wouldn't this be way nicer for users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants