Skip to content

Commit 786dde7

Browse files
author
Tsotne Tabidze
authored
Allow Feast apply to import files recursively (and add .feastignore) (feast-dev#1482)
* feast apply should import files recursively, add .feastignore Signed-off-by: Tsotne Tabidze <[email protected]> * Add assertpy to ci dependencies Signed-off-by: Tsotne Tabidze <[email protected]> * Simplify reading using Path.read_text() and update the test file after merging with master Signed-off-by: Tsotne Tabidze <[email protected]> * Add documentation Signed-off-by: Tsotne Tabidze <[email protected]>
1 parent 4d7aada commit 786dde7

7 files changed

Lines changed: 287 additions & 35 deletions

File tree

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
## Reference
2525

2626
* [feature\_store.yaml](reference/feature-store-yaml.md)
27+
* [.feastignore](reference/feast-ignore.md)
2728
* [Python API reference](http://rtd.feast.dev/)
2829

2930
## Feast on Kubernetes

docs/concepts/feature-repository.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ A feature repository consists of:
1010

1111
* A collection of Python files containing feature declarations.
1212
* A `feature_store.yaml` file containing infrastructural configuration.
13+
* A `.feastignore` file containing paths in the feature repository to ignore.
1314

1415
{% hint style="info" %}
1516
Typically, users store their feature repositories in a Git repository, especially when working in teams. However, using Git is not a requirement.
@@ -19,27 +20,28 @@ Typically, users store their feature repositories in a Git repository, especiall
1920

2021
The structure of a feature repository is as follows:
2122

22-
* The root of the repository should contain a `feature_store.yaml` file.
23+
* The root of the repository should contain a `feature_store.yaml` file and may contain a `.feastignore` file.
2324
* The repository should contain Python files that contain feature definitions.
2425
* The repository can contain other files as well, including documentation and potentially data files.
2526

2627
An example structure of a feature repository is shown below:
2728

2829
```text
29-
$ tree
30+
$ tree -a
3031
.
3132
├── data
3233
│ └── driver_stats.parquet
3334
├── driver_features.py
34-
└── feature_store.yaml
35+
├── feature_store.yaml
36+
└── .feastignore
3537
36-
1 directory, 3 files
38+
1 directory, 4 files
3739
```
3840

3941
A couple of things to note about the feature repository:
4042

41-
* Feast does not currently read through subdirectories of the feature repository when commands. All feature definition files must reside at the root of the repository.
42-
* Feast reads _all_ Python files when `feast apply` is ran, even if they don't contain feature definitions. It's recommended to store imperative scripts in a different location than inside the feature registry for this purpose.
43+
* Feast reads _all_ Python files recursively when `feast apply` is ran, including subdirectories, even if they don't contain feature definitions.
44+
* It's recommended to add `.feastignore` and add paths to all imperative scripts if you need to store them inside the feature registry.
4345

4446
## The feature\_store.yaml configuration file
4547

@@ -57,6 +59,28 @@ online_store:
5759
5860
The `feature_store.yaml` file configures how the feature store should run. See [feature\_store.yaml](../reference/feature-store-yaml.md) for more details.
5961

62+
## The .feastignore file
63+
64+
This file contains paths that should be ignored when running `feast apply`. An example `.feastignore` is shown below:
65+
66+
{% code title=".feastignore" %}
67+
```
68+
# Ignore virtual environment
69+
venv
70+
71+
# Ignore a specific Python file
72+
scripts/foo.py
73+
74+
# Ignore all Python files directly under scripts directory
75+
scripts/*.py
76+
77+
# Ignore all "foo.py" anywhere under scripts directory
78+
scripts/**/foo.py
79+
```
80+
{% endcode %}
81+
82+
See [.feastignore](../reference/feast-ignore.md) for more details.
83+
6084
## Feature definitions
6185
6286
A feature repository can also contain one or more Python files that contain feature definitions. An example feature definition file is shown below:
@@ -97,5 +121,4 @@ To declare new feature definitions, just add code to the feature repository, eit
97121
### Next steps
98122

99123
* See [Create a feature repository](../how-to-guides/create-a-feature-repository.md) to get started with an example feature repository.
100-
* See [feature\_store.yaml](../reference/feature-store-yaml.md) or [Feature Views](feature-views.md) for more information on the configuration files that live in a feature registry.
101-
124+
* See [feature\_store.yaml](../reference/feature-store-yaml.md), [.feastignore](../reference/feast-ignore.md) or [Feature Views](feature-views.md) for more information on the configuration files that live in a feature registry.

docs/reference/feast-ignore.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# .feastignore
2+
3+
## Overview
4+
5+
`.feastignore` is a file that is placed at the root of the [Feature Repository](../concepts/feature-repository.md). This file contains paths that should be ignored when running `feast apply`. An example `.feastignore` is shown below:
6+
7+
{% code title=".feastignore" %}
8+
```
9+
# Ignore virtual environment
10+
venv
11+
12+
# Ignore a specific Python file
13+
scripts/foo.py
14+
15+
# Ignore all Python files directly under scripts directory
16+
scripts/*.py
17+
18+
# Ignore all "foo.py" anywhere under scripts directory
19+
scripts/**/foo.py
20+
```
21+
{% endcode %}
22+
23+
`.feastignore` file is optional. If the file can not be found, every Python in the feature repo directory will be parsed by `feast apply`.
24+
25+
## Feast Ignore Patterns
26+
27+
| Pattern | Example matches | Explanation |
28+
| ----------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
29+
| venv | venv/foo.py<br>venv/a/foo.py | You can specify a path to a specific directory. Everything in that directory will be ignored. |
30+
| scripts/foo.py | scripts/foo.py | You can specify a path to a specific file. Only that file will be ignored. |
31+
| scripts/*.py | scripts/foo.py<br>scripts/bar.py | You can specify asterisk (*) anywhere in the expression. An asterisk matches zero or more characters, except "/". |
32+
| scripts/**/foo.py | scripts/foo.py<br>scripts/a/foo.py<br>scripts/a/b/foo.py | You can specify double asterisk (**) anywhere in the expression. A double asterisk matches zero or more directories. |

sdk/python/feast/repo_operations.py

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from datetime import timedelta
66
from importlib.abc import Loader
77
from pathlib import Path
8-
from typing import List, NamedTuple, Union
8+
from typing import List, NamedTuple, Set, Union
99

1010
import click
1111

@@ -31,15 +31,64 @@ class ParsedRepo(NamedTuple):
3131
entities: List[Entity]
3232

3333

34+
def read_feastignore(repo_root: Path) -> List[str]:
35+
"""Read .feastignore in the repo root directory (if exists) and return the list of user-defined ignore paths"""
36+
feast_ignore = repo_root / ".feastignore"
37+
if not feast_ignore.is_file():
38+
return []
39+
lines = feast_ignore.read_text().strip().split("\n")
40+
ignore_paths = []
41+
for line in lines:
42+
# Remove everything after the first occurance of "#" symbol (comments)
43+
if line.find("#") >= 0:
44+
line = line[: line.find("#")]
45+
# Strip leading or ending whitespaces
46+
line = line.strip()
47+
# Add this processed line to ignore_paths if it's not empty
48+
if len(line) > 0:
49+
ignore_paths.append(line)
50+
return ignore_paths
51+
52+
53+
def get_ignore_files(repo_root: Path, ignore_paths: List[str]) -> Set[Path]:
54+
"""Get all ignore files that match any of the user-defined ignore paths"""
55+
ignore_files = set()
56+
for ignore_path in ignore_paths:
57+
# ignore_path may contains matchers (* or **). Use glob() to match user-defined path to actual paths
58+
for matched_path in repo_root.glob(ignore_path):
59+
if matched_path.is_file():
60+
# If the matched path is a file, add that to ignore_files set
61+
ignore_files.add(matched_path.resolve())
62+
else:
63+
# Otherwise, list all Python files in that directory and add all of them to ignore_files set
64+
ignore_files |= {
65+
sub_path.resolve()
66+
for sub_path in matched_path.glob("**/*.py")
67+
if sub_path.is_file()
68+
}
69+
return ignore_files
70+
71+
72+
def get_repo_files(repo_root: Path) -> List[Path]:
73+
"""Get the list of all repo files, ignoring undesired files & directories specified in .feastignore"""
74+
# Read ignore paths from .feastignore and create a set of all files that match any of these paths
75+
ignore_paths = read_feastignore(repo_root)
76+
ignore_files = get_ignore_files(repo_root, ignore_paths)
77+
78+
# List all Python files in the root directory (recursively)
79+
repo_files = {p.resolve() for p in repo_root.glob("**/*.py") if p.is_file()}
80+
# Ignore all files that match any of the ignore paths in .feastignore
81+
repo_files -= ignore_files
82+
83+
# Sort repo_files to read them in the same order every time
84+
return sorted(repo_files)
85+
86+
3487
def parse_repo(repo_root: Path) -> ParsedRepo:
3588
""" Collect feature table definitions from feature repo """
3689
res = ParsedRepo(feature_tables=[], entities=[], feature_views=[])
3790

38-
# FIXME: process subdirs but exclude hidden ones
39-
repo_files = [p.resolve() for p in repo_root.glob("*.py")]
40-
41-
for repo_file in repo_files:
42-
91+
for repo_file in get_repo_files(repo_root):
4392
module_path = py_path_to_module(repo_file, repo_root)
4493
module = importlib.import_module(module_path)
4594

sdk/python/setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,8 @@
8888
"adlfs==0.5.9",
8989
"firebase-admin==4.5.2",
9090
"google-cloud-datastore==2.1.0",
91-
"pre-commit"
91+
"pre-commit",
92+
"assertpy==1.1",
9293
]
9394

9495
# README file from Feast repo root directory

sdk/python/tests/test_cli_local.py

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
from pathlib import Path
44
from textwrap import dedent
55

6+
import assertpy
7+
68
from feast.feature_store import FeatureStore
79
from tests.cli_utils import CliRunner
810
from tests.online_read_write_test import basic_rw_test
@@ -39,39 +41,39 @@ def test_workflow() -> None:
3941
)
4042

4143
result = runner.run(["apply"], cwd=repo_path)
42-
assert result.returncode == 0
44+
assertpy.assert_that(result.returncode).is_equal_to(0)
4345

4446
# entity & feature view list commands should succeed
4547
result = runner.run(["entities", "list"], cwd=repo_path)
46-
assert result.returncode == 0
48+
assertpy.assert_that(result.returncode).is_equal_to(0)
4749
result = runner.run(["feature-views", "list"], cwd=repo_path)
48-
assert result.returncode == 0
50+
assertpy.assert_that(result.returncode).is_equal_to(0)
4951

5052
# entity & feature view describe commands should succeed when objects exist
5153
result = runner.run(["entities", "describe", "driver"], cwd=repo_path)
52-
assert result.returncode == 0
54+
assertpy.assert_that(result.returncode).is_equal_to(0)
5355
result = runner.run(
5456
["feature-views", "describe", "driver_locations"], cwd=repo_path
5557
)
56-
assert result.returncode == 0
58+
assertpy.assert_that(result.returncode).is_equal_to(0)
5759

5860
# entity & feature view describe commands should fail when objects don't exist
5961
result = runner.run(["entities", "describe", "foo"], cwd=repo_path)
60-
assert result.returncode == 1
62+
assertpy.assert_that(result.returncode).is_equal_to(1)
6163
result = runner.run(["feature-views", "describe", "foo"], cwd=repo_path)
62-
assert result.returncode == 1
64+
assertpy.assert_that(result.returncode).is_equal_to(1)
6365

6466
# Doing another apply should be a no op, and should not cause errors
6567
result = runner.run(["apply"], cwd=repo_path)
66-
assert result.returncode == 0
68+
assertpy.assert_that(result.returncode).is_equal_to(0)
6769

6870
basic_rw_test(
6971
FeatureStore(repo_path=str(repo_path), config=None),
7072
view_name="driver_locations",
7173
)
7274

7375
result = runner.run(["teardown"], cwd=repo_path)
74-
assert result.returncode == 0
76+
assertpy.assert_that(result.returncode).is_equal_to(0)
7577

7678

7779
def test_non_local_feature_repo() -> None:
@@ -104,13 +106,13 @@ def test_non_local_feature_repo() -> None:
104106
)
105107

106108
result = runner.run(["apply"], cwd=repo_path)
107-
assert result.returncode == 0
109+
assertpy.assert_that(result.returncode).is_equal_to(0)
108110

109111
fs = FeatureStore(repo_path=str(repo_path))
110-
assert len(fs.list_feature_views()) == 3
112+
assertpy.assert_that(fs.list_feature_views()).is_length(3)
111113

112114
result = runner.run(["teardown"], cwd=repo_path)
113-
assert result.returncode == 0
115+
assertpy.assert_that(result.returncode).is_equal_to(0)
114116

115117

116118
@contextmanager
@@ -150,19 +152,23 @@ def test_3rd_party_providers() -> None:
150152
# Check with incorrect built-in provider name (no dots)
151153
with setup_third_party_provider_repo("feast123") as repo_path:
152154
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
153-
assert return_code == 1
154-
assert b"Provider 'feast123' is not implemented" in output
155+
assertpy.assert_that(return_code).is_equal_to(1)
156+
assertpy.assert_that(output).contains(b"Provider 'feast123' is not implemented")
155157
# Check with incorrect third-party provider name (with dots)
156158
with setup_third_party_provider_repo("feast_foo.provider") as repo_path:
157159
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
158-
assert return_code == 1
159-
assert b"Could not import provider module 'feast_foo'" in output
160+
assertpy.assert_that(return_code).is_equal_to(1)
161+
assertpy.assert_that(output).contains(
162+
b"Could not import provider module 'feast_foo'"
163+
)
160164
# Check with incorrect third-party provider name (with dots)
161-
with setup_third_party_provider_repo("foo.provider") as repo_path:
165+
with setup_third_party_provider_repo("foo.FooProvider") as repo_path:
162166
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
163-
assert return_code == 1
164-
assert b"Could not import provider 'provider' from module 'foo'" in output
167+
assertpy.assert_that(return_code).is_equal_to(1)
168+
assertpy.assert_that(output).contains(
169+
b"Could not import provider 'FooProvider' from module 'foo'"
170+
)
165171
# Check with correct third-party provider name
166172
with setup_third_party_provider_repo("foo.provider.FooProvider") as repo_path:
167173
return_code, output = runner.run_with_output(["apply"], cwd=repo_path)
168-
assert return_code == 0
174+
assertpy.assert_that(return_code).is_equal_to(0)

0 commit comments

Comments
 (0)