Abstract basic class of all TFXIO API implementations.
Methods
ArrowSchema
@abc.abstractmethod
ArrowSchema() -> pa.Schema
Returns the schema of the RecordBatch
produced by self.BeamSource()
.
May raise an error if the TFMD schema was not provided at construction time.
BeamSource
@abc.abstractmethod
BeamSource( batch_size: Optional[int] = None ) -> beam.PTransform
Returns a beam PTransform
that produces PCollection[pa.RecordBatch]
.
May NOT raise an error if the TFMD schema was not provided at construction time.
If a TFMD schema was provided at construction time, all the
pa.RecordBatch
es in the result PCollection
must be of the same schema
returned by self.ArrowSchema
. If a TFMD schema was not provided, the
pa.RecordBatch
es might not be of the same schema (they may contain
different numbers of columns).
Args | |
---|---|
batch_size
|
if not None, the pa.RecordBatch produced will be of the
specified size. Otherwise it's automatically tuned by Beam.
|
Project
Project(
tensor_names: List[Text]
) -> 'TFXIO'
Projects the dataset represented by this TFXIO.
A Projected TFXIO:
- Only columns needed for given tensor_names are guaranteed to be
produced by
self.BeamSource()
self.TensorAdapterConfig()
andself.TensorFlowDataset()
are trimmed to contain only those tensors.- It retains a reference to the very original TFXIO, so its TensorAdapter
knows about the specs of the tensors that would be produced by the
original TensorAdapter. Also see
TensorAdapter.OriginalTensorSpec()
.
May raise an error if the TFMD schema was not provided at construction time.
Args | |
---|---|
tensor_names
|
a set of tensor names. |
Returns | |
---|---|
A TFXIO instance that is the same as self except that:
|
RecordBatches
@abc.abstractmethod
RecordBatches( options:
tfx_bsl.public.tfxio.RecordBatchesOptions
) -> Iterator[pa.RecordBatch]
Returns an iterable of record batches.
This can be used outside of Apache Beam or TensorFlow to access data.
Args | |
---|---|
options
|
An options object for iterating over record batches. Look at
dataset_options.RecordBatchesOptions for more details.
|
TensorAdapter
TensorAdapter() -> tfx_bsl.public.tfxio.TensorAdapter
Returns a TensorAdapter that converts pa.RecordBatch to TF inputs.
May raise an error if the TFMD schema was not provided at construction time.
TensorAdapterConfig
TensorAdapterConfig() -> tfx_bsl.public.tfxio.TensorAdapterConfig
Returns the config to initialize a TensorAdapter
.
Returns | |
---|---|
a TensorAdapterConfig that is the same as what is used to initialize the
TensorAdapter returned by self.TensorAdapter() .
|
TensorFlowDataset
@abc.abstractmethod
TensorFlowDataset( options:
tfx_bsl.public.tfxio.TensorFlowDatasetOptions
) -> tf.data.Dataset
Returns a tf.data.Dataset of TF inputs.
May raise an error if the TFMD schema was not provided at construction time.
Args | |
---|---|
options
|
an options object for the tf.data.Dataset. Look at
dataset_options.TensorFlowDatasetOptions for more details.
|
TensorRepresentations
@abc.abstractmethod
TensorRepresentations() ->
tfx_bsl.public.tfxio.TensorRepresentations
Returns the TensorRepresentations
.
These TensorRepresentation
s describe the tensors or composite tensors
produced by the TensorAdapter
created from self.TensorAdapter()
or
the tf.data.Dataset created from self.TensorFlowDataset()
.
May raise an error if the TFMD schema was not provided at construction time. May raise an error if the tensor representations are invalid.