Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Feast Python SDK Quickstart Guide

Pre-requisites

Where to get it

Binary installers for the latest released version are available at the PyPI:

 pip install feast

Getting started

Configuring Feast client

All interaction with feast cluster happens via an instance of feast.sdk.client.Client. It should be pointed to correct core/serving urls of the feast cluster:

from feast.sdk.client import Client

fs = Client(
    core_url=FEAST_CORE_URL, 
    serving_url=FEAST_SERVING_URL, 
    verbose=True)

If you are running a local feast cluster, then FEAST_CORE_URL and FEAST_SERVING_URL could be something like:

FEAST_CORE_URL="localhost:8080"
FEAST_SERVING_URL="localhost:8081"

Registering an entity and features

Entities and features could be registered explicitly beforehand or during the data ingestion.

Register your first entity:

from feast.sdk.resources.entity import Entity

entity = Entity(
    name='word',
    description='word found in shakespearean works'
)

fs.apply(entity)

And a feature, that belongs to this entity:

from feast.sdk.resources.feature import Feature, Datastore, ValueType

word_count_feature = Feature(
    entity='word',
    name='count',
    value_type=ValueType.INT32,                      
    description='number of times the word appears',
    tags=['tag1', 'tag2'],
    owner='[email protected]',
    uri='https://github.com/bob/example',
    warehouse_store=Datastore(id='WAREHOUSE'),
    serving_store=Datastore(id='SERVING')    
)

fs.apply(word_count_feature)

Read more on entity/feature fields here: Entity Spec

Ingest data for your feature

Let's create a simple pandas dataframe

import pandas as pd

words_df = pd.DataFrame({
    'word': ['the', 'and', 'i', 'to', 'of', 'a', 'you', 'my', 'in', 'that', 'is', 'not', 'with', 'me', 'it'],
    'count': [28944, 27317, 21120, 20136, 17181, 14945, 13989, 12949, 11513, 11488, 9545, 8855, 8293, 8043, 8003]
    })

And import it into the feast store:

from datetime import datetime
from feast.sdk.importer import Importer

STAGING_LOCATION = 'gs://your-bucket'  

importer = Importer.from_df(words_df, 
                           entity='word', 
                           owner='[email protected]',
                           id_column='word',
                           timestamp_value=datetime(2018, 1, 1),
                           staging_location=STAGING_LOCATION,
                           serving_store=Datastore(id='SERVING'),
                           warehouse_store=Datastore(id='WAREHOUSE'))
    
fs.run(importer)

This will start an import job and ingest data into both warehouse and serving feast stores. You can also import data from a CSV file (Importer.from_csv(...)) or a BigQuery (Importer.from_bq(...))

Query feature data for training your models

Now, when you have some data in the feast store, you may want to retrieve that dataset and train your model. Creating a training dataset allows you to isolate the data that goes into the model training step, allowing for reproduction and traceability.

Also, it's possible to retrieve only these features required for training, by specifying them in a FeatureSet:

from feast.sdk.resources.feature_set import FeatureSet

training_fs = FeatureSet(entity="word", features=["word.count"])

dataset_info = fs.create_dataset(
                                training_fs, 
                                start_date="2010-01-01", 
                                end_date="2018-01-01")

train_df = fs.download_dataset_to_df(dataset_info, staging_location=STAGING_LOCATION)

# train your model
# ...

Query feature data for serving your models

Feast provides a means for accessing stored features in a serving environment, at low latency and high load.

You have to provide IDs of entities, features of which you want to get served:

serving_fs = FeatureSet(entity="word", features=["word.count"])

serving_df = fs.get_serving_data(serving_fs, entity_keys=["you", "and", "i"])

This will return a resulting serving_df as following:

word count
0 and 27317
1 you 13989
2 i 21120