Skip to content

Latest commit

 

History

History
153 lines (109 loc) · 6.05 KB

data-ingestion.md

File metadata and controls

153 lines (109 loc) · 6.05 KB

Data Ingestion

Hollow includes a few ready-made data ingestion mechanisms. Additionally, custom data ingestion mechanisms can be created relatively easily using the Low Level Input API.

HollowObjectMapper

When using a HollowProducer, each call to state.add(obj) is delegated to a HollowObjectMapper. The HollowObjectMapper is used to add POJOs into a HollowWriteStateEngine:

HollowWriteStateEngine engine = /// a state engine
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);

for(Movie movie : movies)
    writeEngine.add(movie);

The HollowObjectMapper can also be used to initialize the data model of a HollowWriteStateEngine without adding any actual data:

HollowWriteStateEngine engine = /// a state engine
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);

mapper.initializeTypeState(Movie.class);
mapper.initializeTypeState(Show.class);

Schemas will be assumed based on the field and type names in the POJOs. Any referenced types will also be traversed and included in the derived data model.

!!! hint "Thread Safety" The HollowObjectMapper is thread-safe; multiple threads may add Objects at the same time.

Specifying type names in the HollowObjectMapper

By default, type names are equal to the names of classes added to the HollowObjectMapper. Alternatively, the name of a type may be explicitly defined by using the @HollowTypeName annotation. This annotation can be added at either the class or field level.

The following example Award class will reference a type AwardName, which will contain a single string value:

public class Award {
    long id;

    @HollowTypeName(name="AwardName")
    String name;
}

The following example Category class will be added as the type Genre, where not otherwise specified by referencing fields:

@HollowTypeName(name="Genre")
public class Category {
   ...
}

!!! note "Namespaced fields" Using the @HollowTypeName attribute is a convenient way to add appropriate record type namespacing into your data model.

Inlining fields in the HollowObjectMapper

You can inline fields in the HollowObjectMapper by annotating them with @HollowInline. The following example Creator class inlines the field creatorName:

public class Creator {
    long id;

    @HollowInline
    String creatorName;
}

The following java.lang.* types can be inlined:

  • String
  • Boolean
  • Integer
  • Long
  • Double
  • Float
  • Short
  • Byte
  • Character

Memoizing POJOs in the HollowObjectMapper

If a long field named __assigned_ordinal is defined in a POJO class, then HollowObjectMapper will use this field to record the assigned ordinal when Objects of this class are added to the state engine.

When the HollowObjectMapper sees this POJO again, it will short-circuit writing to the state engine and discovering or assigning an ordinal -- it will instead return the previously recorded ordinal. If during processing you can reuse duplicate referenced POJOs, then you can use this effect to greatly speed up adding records to the state engine.

If the __assigned_ordinal field is present, it should be initialized to HollowConstants.ORDINAL_NONE. The field may be (but does not have to be) private and/or final.

The following example Director class uses the __assigned_ordinal optimization:

public class Director {
    long id;
    String directorName;

    private transient final long __assigned_ordinal = HollowConstants.ORDINAL_NONE;
}

!!! warning If the __assigned_ordinal optimization is used, POJOs should not be modified after they are added to the state engine. Any modifications after the first time a memoized POJO is added to the state engine will be ignored and any references to these POJOs will always point to the originally added record. Using immutable POJOs can help reduce errors.

JSON to Hollow

The project hollow-jsonadapter contains a component which will automatically parse json into a HollowWriteStateEngine. The expected format of the json will be defined by the schemas in the HollowWriteStateEngine. The data model must be pre-initialized. See the Schema Parser topic in this document for an easy way to configure the schemas with a text document.

The HollowJsonAdapter class is used to populate records of a single type from a json file. A single record:

{
  "id": 1,
  "releaseYear": 1999,
  "actors": [
     {
        "id": 101,
        "actorName": "Keanu Reeves"
     },
     {
        "id": 102,
        "actorName": "Laurence Fishburne"
     }
  ]
}

Can be parsed with the following code:

String json = /// the record above

HollowJsonAdapter jsonAdapter = new HollowJsonAdapter(writeEngine, "Movie");

jsonAdapter.processRecord(json);

If a field defined in the schema is not encountered in the json data, the value will be null in the corresponding Hollow record. If a field is encountered in the json data which is not defined in the schema, the field will be ignored.

A large number of records in a single file can also be processed:

Reader reader = /// a reader for the json file

HollowJsonAdapter jsonAdapter = new HollowJsonAdapter(writeEngine, "Movie");

jsonAdapter.populate(reader);

When processing an entire file, it is expected that the file contains only a single json array of records of the expected type. The records will be processed in parallel.

!!! hint "Hollow to JSON" Hollow objects can be converted to JSON string using HollowRecordJsonStringifier. Tools backed by Hollow data is one of the cases where this can be useful.

Zeno to Hollow

The project hollow-zenoadapter has an adapter which can be used with Hollow’s predecessor, Zeno. We used this as part of our migration path from Zeno to Hollow, and it is provided for current users of Zeno who would like to migrate to Hollow as well. Start with the HollowStateEngineCreator.