tfx.v1.dsl.components.component

Decorator: creates a component from a typehint-annotated Python function.

Used in the notebooks

Used in the tutorials

This decorator creates a component based on typehint annotations specified for the arguments and return value for a Python function. The decorator can be supplied with a parameter component_annotation to specify the annotation for this component decorator. This annotation hints which system execution type this python function-based component belongs to. Specifically, function arguments can be annotated with the following types and associated semantics:

  • Parameter[T] where T is int, float, str, or bool: indicates that a primitive type execution parameter, whose value is known at pipeline construction time, will be passed for this argument. These parameters will be recorded in ML Metadata as part of the component's execution record. Can be an optional argument.
  • int, float, str, bytes, bool, Dict, List: indicates that a primitive type value will be passed for this argument. This value is tracked as an Integer, Float, String, Bytes, Boolean or JsonValue artifact (see tfx.types.standard_artifacts) whose value is read and passed into the given Python component function. Can be an optional argument.
  • InputArtifact[ArtifactType]: indicates that an input artifact object of type ArtifactType (deriving from tfx.types.Artifact) will be passed for this argument. This artifact is intended to be consumed as an input by this component (possibly reading from the path specified by its .uri). Can be an optional argument by specifying a default value of None.
  • OutputArtifact[ArtifactType]: indicates that an output artifact object of type ArtifactType (deriving from tfx.types.Artifact) will be passed for this argument. This artifact is intended to be emitted as an output by this component (and written to the path specified by its .uri). Cannot be an optional argument.

The return value typehint should be either empty or None, in the case of a component function that has no return values, or a TypedDict of primitive value types (int, float, str, bytes, bool, dict or list; or Optional[T], where T is a primitive type value, in which case None can be returned), to indicate that the return value is a dictionary with specified keys and value types.

Note that output artifacts should not be included in the return value typehint; they should be included as OutputArtifact annotations in the function inputs, as described above.

The function to which this decorator is applied must be at the top level of its Python module (it may not be defined within nested classes or function closures).

This is example usage of component definition using this decorator:

from tfx import v1 as tfx

InputArtifact = tfx.dsl.components.InputArtifact
OutputArtifact = tfx.dsl.components.OutputArtifact
Parameter = tfx.dsl.components.Parameter
Examples = tfx.types.standard_artifacts.Examples
Model = tfx.types.standard_artifacts.Model

class MyOutput(TypedDict):
  loss: float
  accuracy: float

@component(component_annotation=tfx.dsl.standard_annotations.Train)
def MyTrainerComponent(
    training_data: InputArtifact[Examples],
    model: OutputArtifact[Model],
    dropout_hyperparameter: float,
    num_iterations: Parameter[int] = 10
) -> MyOutput:
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  model_obj = train_model(records, num_iterations, dropout_hyperparameter)
  model_obj.write_to(model.uri)

  return {
    'loss': model_obj.loss,
    'accuracy': model_obj.accuracy
  }

Example:usage in a pipeline graph definition:
# ...
trainer = MyTrainerComponent(
    training_data=example_gen.outputs['examples'],
    dropout_hyperparameter=other_component.outputs['dropout'],
    num_iterations=1000)
pusher = Pusher(model=trainer.outputs['model'])
# ...

When the parameter component_annotation is not supplied, the default value is None. This is another example usage with component_annotation = None:

@component
def MyTrainerComponent(
    training_data: InputArtifact[standard_artifacts.Examples],
    model: OutputArtifact[standard_artifacts.Model],
    dropout_hyperparameter: float,
    num_iterations: Parameter[int] = 10
    ) -> Output:
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  model_obj = train_model(records, num_iterations, dropout_hyperparameter)
  model_obj.write_to(model.uri)

  return {
    'loss': model_obj.loss,
    'accuracy': model_obj.accuracy
  }

When the parameter use_beam is True, one of the parameters of the decorated function type-annotated by BeamComponentParameter[beam.Pipeline] and the default value can only be None. It will be replaced by a beam Pipeline made with the tfx pipeline's beam_pipeline_args that's shared with other beam-based components:

@component(use_beam=True)
def DataProcessingComponent(
    input_examples: InputArtifact[standard_artifacts.Examples],
    output_examples: OutputArtifact[standard_artifacts.Examples],
    beam_pipeline: BeamComponentParameter[beam.Pipeline] = None,
    ) -> None:
  '''My simple trainer component.'''

  records = read_examples(training_data.uri)
  with beam_pipeline as p:
    ...

func Typehint-annotated component executor function.
component_annotation used to annotate the python function-based component. It is a subclass of SystemExecution from third_party/py/tfx/types/system_executions.py; it can be None.
use_beam Whether to create a component that is a subclass of BaseBeamComponent. This allows a beam.Pipeline to be made with tfx-pipeline-wise beam_pipeline_args.

An object that:
  1. you can call like the initializer of a subclass of base_component.BaseComponent (or base_component.BaseBeamComponent).
  2. has a test_call() member function for unit testing the inner implementation of the component. Today, the returned object is literally a subclass of BaseComponent, so it can be used as a Type e.g. in isinstance() checks. But you must not rely on this, as we reserve the right to reserve a different kind of object in future, which only satisfies the two criteria (1.) and (2.) above without being a Type itself.

EnvironmentError if the current Python interpreter is not Python 3.