Skip to content

Commit 8cc095e

Browse files
woopgitbook-bot
authored andcommitted
GitBook: [master] 5 pages modified
1 parent 7dff49a commit 8cc095e

5 files changed

Lines changed: 184 additions & 8 deletions

File tree

docs/SUMMARY.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@
1313
* [Roadmap](roadmap.md)
1414
* [Changelog](https://github.com/feast-dev/feast/blob/master/CHANGELOG.md)
1515

16+
## Tutorials
17+
18+
* [Driver ranking](tutorials/driver-ranking-with-feast.md)
19+
1620
## Concepts
1721

1822
* [Overview](concepts/overview.md)
@@ -23,6 +27,10 @@
2327
* [Provider](concepts/provider.md)
2428
* [Architecture](concepts/architecture-and-components.md)
2529

30+
## How-to Guides
31+
32+
* [Running Feast in production](how-to-guides/untitled.md)
33+
2634
## Reference
2735

2836
* [Data sources](reference/data-sources/README.md)

docs/getting-started/build-a-training-dataset.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22

33
Feast allows users to build a training dataset from time-series feature data that already exists in an offline store. Users are expected to provide a list of features to retrieve \(which may span multiple feature views\), and a dataframe to join the resulting features onto. Feast will then execute a point-in-time join of multiple feature views onto the provided dataframe, and return the full resulting dataframe.
44

5-
### Retrieving historical features
5+
## Retrieving historical features
66

7-
#### 1. Register your feature views
7+
### 1. Register your feature views
88

99
Please ensure that you have created a feature repository and that you have registered \(applied\) your feature views with Feast.
1010

1111
{% page-ref page="deploy-a-feature-store.md" %}
1212

13-
#### 2. Define feature references
13+
### 2. Define feature references
1414

1515
Start by defining the feature references \(e.g., `driver_trips:average_daily_rides`\) for the features that you would like to retrieve from the offline store. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity \(or composite entity\), and that they aren't located in the same offline store.
1616

@@ -69,5 +69,5 @@ training_df = fs.get_historical_features(
6969
).to_df()
7070
```
7171

72-
Once the feature references and an entity dataframe are defined, it is possible to call `get_historical_features()`. This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling `to_df()`.
72+
Once the feature references and an entity dataframe are defined, it is possible to call `get_historical_features()`. This method launches a job that executes a point-in-time join of features from the offline store onto the entity dataframe. Once completed, a job reference will be returned. This job reference can then be converted to a Pandas dataframe by calling `to_df()`.
7373

docs/getting-started/read-features-from-the-online-store.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ The Feast Python SDK allows users to retrieve feature values from an online stor
66
Online stores only maintain the current state of features, i.e latest feature values. No historical data is stored or served.
77
{% endhint %}
88

9-
### Retrieving online features
9+
## Retrieving online features
1010

11-
#### 1. Ensure that feature values have been loaded into the online store
11+
### 1. Ensure that feature values have been loaded into the online store
1212

1313
Please ensure that you have materialized \(loaded\) your feature values into the online store before starting
1414

1515
{% page-ref page="load-data-into-the-online-store.md" %}
1616

17-
#### 2. Define feature references
17+
### 2. Define feature references
1818

1919
Create a list of features that you would like to retrieve. This list typically comes from the model training step and should accompany the model binary.
2020

@@ -25,7 +25,7 @@ features = [
2525
]
2626
```
2727

28-
#### 3. Read online features
28+
### 3. Read online features
2929

3030
Next, we will create a feature store object and call `get_online_features()` which reads the relevant feature values directly from the online store.
3131

docs/how-to-guides/untitled.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Running Feast in production
2+
3+
### Overview
4+
5+
In this guide we will show you
6+
7+
1. How to deploy your feature store and keep your infrastructure in sync with your feature repository
8+
2. How to keep the data in your online store up to date
9+
3. How to use Feast for model training and serving
10+
11+
### 1. Automatically deploying changes to your feature definitions
12+
13+
The first step to setting up a deployment of Feast is to create a Git repository that contains your feature definitions. The recommended way to version and track your feature definitions is by committing them to a repository and tracking changes through commits.
14+
15+
Most teams will need to have a feature store deployed to more than one environment. We have created an example repository \([Feast Repository Example](https://github.com/feast-dev/feast-ci-repo-example)\) which contains two Feast projects, one per environment.
16+
17+
The contents of this repository are shown below:
18+
19+
```bash
20+
├── .github
21+
│   └── workflows
22+
│   ├── production.yml
23+
│   └── staging.yml
24+
25+
├── staging
26+
│   ├── driver_repo.py
27+
│   └── feature_store.yaml
28+
29+
└── production
30+
├── driver_repo.py
31+
└── feature_store.yaml
32+
33+
```
34+
35+
The repository contains three sub-folders:
36+
37+
* `staging/`: This folder contains the staging `feature_store.yaml` and Feast objects. Users that want to make changes to the Feast deployment in the staging environment will commit changes to this directory.
38+
* `production/`: This folder contains the production `feature_store.yaml` and Feast objects. Typically users would first test changes in staging before copying the feature definitions into the production folder, before committing the changes.
39+
* `.github`: This folder is an example of a CI system that applies the changes in either the `staging` or `production` repositories using `feast apply`. This operation saves your feature definitions to a shared registry \(for example, on GCS\) and configures your infrastructure for serving features.
40+
41+
The `feature_store.yaml` contains the following?
42+
43+
```text
44+
project: staging
45+
registry: gs://feast-ci-demo-registry/staging/registry.db
46+
provider: gcp
47+
```
48+
49+
Notice how the registry has been configured to use a Google Cloud Storage bucket. All changes made to infrastructure using `feast apply` are tracked in the `registry.db`. This registry will be accessed later by the Feast SDK in your training pipelines or model serving services in order to read features.
50+
51+
{% hint style="success" %}
52+
It is important to note that the CI system above must have access to create, modify, or remove infrastructure in your production environment. This is unlike clients of the feature store, who will only have read access.
53+
{% endhint %}
54+
55+
In summary, once you have set up a Git based repository with CI that runs `feast apply` on changes, your infrastructure \(offline store, online store, and cloud environment\) will automatically be updated to support loading of data into the feature store or retrieval of data.
56+
57+
### 2. How to keep the data in your online store up to date
58+
59+
In order to keep your online store up to date, you need to run a job that loads feature data from your feature view sources into your online store. In Feast, this loading operation is called materialization.
60+
61+
The simplest way to schedule materialization is to run an **incremental** materialization using the Feast CLI:
62+
63+
```text
64+
feast materialize-incremental 2022-01-01T00:00:00
65+
```
66+
67+
The above command will load all feature values from all feature view sources into the online store up to the time `2022-01-01T00:00:00`.
68+
69+
A timestamp is required to set the end date for materialization. If your source is fully up to date then the end date would be the current time. However, if you are querying a source where data is not yet available, then you do not want to set the timestamp to the current time. You would want to use a timestamp that ends at a date for which data is available. The next time `materialize-incremental` is run, Feast will load data that starts from the previous end date, so it is important to ensure that the materialization interval does not overlap with time periods for which data has not been made available. This is commonly the case when your source is an ETL pipeline that is scheduled on a daily basis.
70+
71+
An alternative approach to incremental materialization \(where Feast tracks the intervals of data that need to be ingested\), is to call Feast directly from your scheduler like Airflow. In this case Airflow is the system that tracks the intervals that have been ingested.
72+
73+
```text
74+
feast materialize -v driver_hourly_stats 2020-01-01T00:00:00 2020-01-02T00:00:00
75+
```
76+
77+
In the above example we are materializing the source data from the `driver_hourly_stats` feature view over a day. This command can be scheduled as the final operation in your Airflow ETL, which runs after you have computed your features and stored them in the source location. Feast will then load your feature data into your online store.
78+
79+
The timestamps above should match the interval of data that has been computed by the data transformation system.
80+
81+
### 3. How to use Feast for model training and serving
82+
83+
Now that you have deployed a registry, provisioned your feature store, and loaded your data into your online store, your clients can start to consume features for training and inference.
84+
85+
For both model training and inferencing your clients will use the Feast Python SDK to retrieve features. In both cases it is necessary to create a `FeatureStore` object.
86+
87+
One way to ensure your production clients have access to the feature store is to provide a copy of the `feature_store.yaml` to those pipelines. This `feature_store.yaml` file will have a reference to the feature store registry, which allows clients to retrieve features from offline or online stores.
88+
89+
```python
90+
prod_fs = FeatureStore(repo_path="production_feature_store.yaml")
91+
```
92+
93+
Then, training data can be retrieved as follows:
94+
95+
```python
96+
feature_refs = [
97+
'driver_hourly_stats:conv_rate',
98+
'driver_hourly_stats:acc_rate',
99+
'driver_hourly_stats:avg_daily_trips'
100+
]
101+
102+
training_df = prod_fs.get_historical_features(
103+
entity_df=entity_df,
104+
feature_refs=feature_refs,
105+
).to_df()
106+
107+
model = ml.fit(training_df)
108+
```
109+
110+
The most common way to productionize ML models is by storing and versioning models in a "model store", and then deploying these models into production. When using Feast, it is recommended that the list of feature references also be saved alongside the model. This ensures that models and the features they are trained on are paired together when being shipped into production:
111+
112+
```python
113+
# Save model
114+
model.save('my_model.bin')
115+
116+
# Save features
117+
open('feature_refs.json', 'w') as f:
118+
json.dump(feature_refs, f)
119+
```
120+
121+
you can simply create a `FeatureStore` object, fetch the features, and then make a prediction:
122+
123+
```python
124+
# Load model
125+
model = ml.load('my_model.bin')
126+
127+
# Load feature references
128+
with open('feature_refs.json', 'r') as f:
129+
feature_refs = json.load(f)
130+
131+
# Create feature store object
132+
prod_fs = FeatureStore(repo_path="production_feature_store.yaml")
133+
134+
# Read online features
135+
feature_vector = prod_fs.get_online_features(
136+
feature_refs=feature_refs,
137+
entity_rows=[{"driver_id": 1001}]
138+
).to_dict()
139+
140+
# Make a prediction
141+
prediction = model.predict(feature_vector)
142+
```
143+
144+
{% hint style="success" %}
145+
It is important to note that both the training pipeline and model serving service only needs read access to the feature registry and associated infrastructure. This prevents clients from accidentally making changes to the feature store.
146+
{% endhint %}
147+
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
description: >-
3+
Making a prediction using a linear regression model is a common use case in
4+
ML. This model predicts if a driver will complete a trip based on a features
5+
ingested into Feast.
6+
---
7+
8+
# Driver ranking
9+
10+
In this example you'll learn how to use some of the key functionality in Feast. The tutorial runs in both local mode and on the Google Cloud Platform \(GCP\). For GCP, you must have access to a GCP project already, including read and write permissions to BigQuery.
11+
12+
### [Driver Ranking Example](https://github.com/feast-dev/feast-driver-ranking-tutorial)
13+
14+
This tutorial guides you in how to use Feast with [scikit-learn](https://scikit-learn.org/stable/). You will learn how to:
15+
16+
1. Train a model locally \(on your laptop\) using data from [BigQuery](https://cloud.google.com/bigquery/)
17+
2. Test the model for online inference using [SQLite](https://www.sqlite.org/index.html) \(for fast iteration\)
18+
3. Test the model for online inference using [Firestore](https://firebase.google.com/products/firestore) \(for production use\)
19+
20+
Try it and let us know what you think!
21+

0 commit comments

Comments
 (0)