Skip to content

feat: Added optional model args for custom training#129

Merged
ivanmkc merged 20 commits into
googleapis:devfrom
ivanmkc:imkc--enrich-custom-training-job-args
Dec 21, 2020
Merged

feat: Added optional model args for custom training#129
ivanmkc merged 20 commits into
googleapis:devfrom
ivanmkc:imkc--enrich-custom-training-job-args

Conversation

@ivanmkc

@ivanmkc ivanmkc commented Dec 11, 2020

Copy link
Copy Markdown
Contributor

Added optional args to CustomTrainingJob.init:

        model_serving_container_command: Optional[Sequence[str]] = None,
        model_serving_container_args: Optional[Sequence[str]] = None,
        model_serving_container_environment_variables: Optional[Dict[str, str]] = None,
        model_serving_container_ports: Optional[Sequence[int]] = None,
        model_description: Optional[str] = None,
        model_instance_schema_uri: Optional[str] = None,
        model_parameters_schema_uri: Optional[str] = None,
        model_prediction_schema_uri: Optional[str] = None,

Fixes https://b.corp.google.com/issues/172365796

TODO

  • Added unit test

@google-cla google-cla Bot added the cla: yes This human has signed the Contributor License Agreement. label Dec 11, 2020
@ivanmkc ivanmkc requested a review from sasha-gitg December 11, 2020 16:32
@ivanmkc ivanmkc changed the title feat: Added optional model args feat: Added optional model args for custom training Dec 11, 2020
@ivanmkc ivanmkc force-pushed the imkc--enrich-custom-training-job-args branch from b17fd65 to bd271e2 Compare December 14, 2020 15:25

@sasha-gitg sasha-gitg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one main item on encapsulating the model args.

send health check requests to the container, and which must be supported
by it. If not specified a standard HTTP path will be used by AI
Platform.
model_serving_container_command: Optional[Sequence[str]]=None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting on the type signatures on these arguments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:
model_serving_container_command: Optional[Sequence[str]]=None,

should be:

model_serving_container_command (Sequence[str]):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, I didn't see that it was the comments you were commenting on.

model_serving_container_health_route
)

self._model_serving_container_command = model_serving_container_command

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take this opportunity to envelope all the Model arguments into an object. Perhaps we can create the instance of gca_model.Model here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model needs model_display_name, which is passed in the run function and not the init function, so we can't create it here.

We COULD create the gca_model.ModelContainerSpec at CustomTrainingJob.init though. I'll go ahead and do that.

@sasha-gitg sasha-gitg Dec 14, 2020

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we create the model up front and set the display_name afterwards? ie.

self._managed_model = gca_model.Model(...)
self._managed_model.display_name = display_name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know you could do that with protos. Let me try it.

Comment thread google/cloud/aiplatform/training_jobs.py Outdated
@ivanmkc

ivanmkc commented Dec 14, 2020

Copy link
Copy Markdown
Contributor Author

Still need to fix end-to-end tests

@ivanmkc ivanmkc force-pushed the imkc--enrich-custom-training-job-args branch from 8b0f881 to fc09010 Compare December 15, 2020 06:14
@ivanmkc

ivanmkc commented Dec 15, 2020

Copy link
Copy Markdown
Contributor Author

Still need to fix end-to-end tests

Done

@ivanmkc

ivanmkc commented Dec 15, 2020

Copy link
Copy Markdown
Contributor Author

@sasha-gitg Ready for another look, thanks!

@ivanmkc

ivanmkc commented Dec 15, 2020

Copy link
Copy Markdown
Contributor Author

I think the Kokoro CI step is broken right?

It's blocking 3 of my PRs from merging.

@sasha-gitg

Copy link
Copy Markdown
Member

I think the Kokoro CI step is broken right?

It's blocking 3 of my PRs from merging.

Update the PRs with the latest dev branch. That should fix the builds.

@ivanmkc ivanmkc force-pushed the imkc--enrich-custom-training-job-args branch from e392932 to 5839b7b Compare December 15, 2020 18:59
@ivanmkc

ivanmkc commented Dec 15, 2020

Copy link
Copy Markdown
Contributor Author

@sasha-gitg still need a stamp if ready.

Comment thread google/cloud/aiplatform/training_jobs.py Outdated
Comment thread google/cloud/aiplatform/training_jobs.py
@ivanmkc

ivanmkc commented Dec 17, 2020

Copy link
Copy Markdown
Contributor Author

@sasha-gitg test fixed

)

managed_model = self._managed_model
if model_display_name:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

managed_model should be None in the case where model_display_name is not passed in to support not exporting a model from the training job.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a bug to capture that as a unit test case?

@sasha-gitg sasha-gitg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just needs one more change to support not exporting a managed model.

@ivanmkc ivanmkc force-pushed the imkc--enrich-custom-training-job-args branch from a2fed09 to 34ebcde Compare December 21, 2020 15:18
@ivanmkc ivanmkc merged commit e45ebfc into googleapis:dev Dec 21, 2020
dizcology pushed a commit to dizcology/python-aiplatform that referenced this pull request Dec 22, 2020
* Added optional model args

* fix: Removed etag

* fix: Added predict schemata and fixed type error

* fix: Added description and fixed predict_schemata

* Added _model_serving_container_command, _model_serving_container_args,                 env=self._model_serving_container_environment_variables and _model_serving_container_ports

* fix: Ran linter

* fix: Added tests for model_instance_schema_uri, model_parameters_schema_uri and model_prediction_schema_uri

* fix: Fixed env and ports and added tests

* fix: Removed model_labels

* fix: Moved container spec creation into init function

* fix: Fixed docstrings

* fix: Moved import to be alphabetical

* fix: Moved model creation to init function

* fix: Fixed predict_schemata

* fix: simplified predict schemata

* fix: added linter

* fix: Fixed trailing comma

* fix: Removed CustomTrainingJob private fields

* fix: Fixed model tests

* fix: Set managed_model to None

Co-authored-by: Ivan Cheung <[email protected]>
sasha-gitg added a commit that referenced this pull request Jan 19, 2021
* fix: unblock builds (#132)

* chore: Update README with Experimental verbiage (#131)

* fix: Fixed comments (#116)

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Implements a wrapped client that instantiates the client at every API invocation (#139)

* feat: Added optional model args for custom training (#129)

* Added optional model args

* fix: Removed etag

* fix: Added predict schemata and fixed type error

* fix: Added description and fixed predict_schemata

* Added _model_serving_container_command, _model_serving_container_args,                 env=self._model_serving_container_environment_variables and _model_serving_container_ports

* fix: Ran linter

* fix: Added tests for model_instance_schema_uri, model_parameters_schema_uri and model_prediction_schema_uri

* fix: Fixed env and ports and added tests

* fix: Removed model_labels

* fix: Moved container spec creation into init function

* fix: Fixed docstrings

* fix: Moved import to be alphabetical

* fix: Moved model creation to init function

* fix: Fixed predict_schemata

* fix: simplified predict schemata

* fix: added linter

* fix: Fixed trailing comma

* fix: Removed CustomTrainingJob private fields

* fix: Fixed model tests

* fix: Set managed_model to None

Co-authored-by: Ivan Cheung <[email protected]>

* Fix: refactor class constructor for retrieving resource (#125)

* Added property and abstract method _getter_method and _resource_noun, implemented method _get_gca_resource to class AiPlatformResourceNoun; Added _resource_noun, _getter_method, to Dataset, Model, Endpoint, subclasses of _Job, _TrainingJob, refactored (_)get_* and utils.full_resource_name in class constructor to self._get_gca_resource to Dataset, Model, Endpoint, _Job

* Added return value in _get_gca_resource, added method _sync_gca_resource in AiPlatformResourceNoun class; removed job_type, updated status method with _sync_gca_resource in _Job class

* fix: added return type and lint issues

* fix: merge conflict issue with models.py

* fix: F401 'abc' imported but unused

* chore: merge main into dev (#154)

* test: Dataset integration tests (#126)

* Add dataset.metadata.text to schemas

* Add first integation tests, Dataset class

* Make teardown work if test fails, update asserts

* Change test folder name, enable system tests

* Hide test_base, test_end_to_end for Kokoro CI bug

* Add GCP Project env var to Kokoro presubmit cfg

* Restore presubmit cfg, drop --quiet in unit tests

* Restore test_base, test_end_to_end to find timeout

* Skip tests depending on persistent resources

* Use auth default creds for system tests

* Drop unused import os

* feat: specialized dataset classes, fix: datasets refactor (#153)

* feat: Refactored Dataset by removing intermediate layers

* Added image_dataset and tabular_dataset subclass

* Moved metadata_schema_uri responsibility to subclass to enable forecasting

* Moved validation logic for tabular into Dataset._create_tabular

* Added validation in image_dataset and fixed bounding_box schema error

* Removed import_config

* Fixed metadata_schema_uri

* Fixed import and subclasses

* Added EmptyNontabularDatasource

* change import_metadata to ioformat

* added datasources.py

* added support of multiple gcs_sources

* fix: default (empty) dataset_metadata need to be set to {}, not None

* 1) imported datasources 2) added _support_metadata_schema_uris and _support_import_schema_classes 3) added getter and setter/validation for resource_metadata_schema_uri, metadata_schema_uri, and import_schema_uri 4) fixed request_metadata, data_item_labels 5) encapsulated dataset_metadata, and import_data_configs 6) added datasource configuration logic

* added image_dataset.py and tabular_dataset.py

* fix: refactor - create datasets modeule

* fix: cleanup __init__.py

* fix: data_item_labels

* fix: docstring

* fix:
- changed NonTabularDatasource.dataset_metadata default to None
- updated NonTabularDatasource docstring
- changed gcs_source type hint with Union
- changed _create_and_import to _create_encapsulated with datasource
- removed subclass.__init__ and irrelevant parameters in create

* fix: import the module instead of the classes for datasources

* fix: removed all validation for import_schema_uri

* fix: set parameter default to immutable

* fix: replaced Datasource / DatasourceImportable abstract class instead of a concrete type

* fix: added examples for gcs_source

* fix:
- remove Sequence from utils.py
- refactor datasources.py to _datasources.py
- change docstring format to arg_name (arg_type): convention
- change and include the type signature _supported_metadata_schema_uris
- change _validate_metadata_schema_uri
- refactor _create_encapsulated to _create_and_import
- refactor to module level imports
- add tests for ImageDataset and TabularDataset

* fix: remove all labels

* fix: remove Optional in docstring, add example for bq_source

* test: add import_data raise for tabular dataset test

* fix: refactor datasource creation with create_datasource

* fix: lint

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Add AutoML Image Training Job class (#152)

* Add AutoMLImageTrainingJob, tests, constants

* Address reviewer comments

* feat: Add custom container support (#164)

* chore: merge main into dev (#162)

* fix: suppress no project id warning (#160)

* fix: suppress no project id warning

* fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials

* fix: move default credentials config to credentials property

* fix: add property setter for credentials to avoid everytime reset

* fix: Fixed wrong key value for multilabel (#168)

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Add delete methods, add list_models and undeploy_all for Endpoint class (#165)

* Endpoint list_models, delete, undeploy_all WIP

* Finish delete + undeploy methods, tests

* Add global pool teardowns for test timeout issue

* Address reviewer comments, add async support

* fix: Fixed bug causing training failure for object detection (#171)

Co-authored-by: Ivan Cheung <[email protected]>

* fix: Support intermediary BQ Table for Custom Training (#166)

* chore: add AutoMLImageTrainingJob to aiplatform namespace (#173)

* fix: Unblock build (#174)

* fix: default credentials config related test failures (#167)

* fix: suppress no project id warning

* fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials

* fix: move default credentials config to credentials property

* fix: add property setter for credentials to avoid everytime reset

* fix: tests for set credentials to default when default not provided

* fix: change credentials with initializer default when not provided in AiPlatformResourceNoun

* fix: use credential mock in tests

* fix: lint

Co-authored-by: sasha-gitg <[email protected]>

Co-authored-by: Ivan Cheung <[email protected]>
Co-authored-by: Ivan Cheung <[email protected]>
Co-authored-by: Morgan Du <[email protected]>
Co-authored-by: Vinny Senthil <[email protected]>
sasha-gitg added a commit that referenced this pull request Jan 23, 2021
* fix: unblock builds (#132)

* chore: Update README with Experimental verbiage (#131)

* fix: Fixed comments (#116)

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Implements a wrapped client that instantiates the client at every API invocation (#139)

* feat: Added optional model args for custom training (#129)

* Added optional model args

* fix: Removed etag

* fix: Added predict schemata and fixed type error

* fix: Added description and fixed predict_schemata

* Added _model_serving_container_command, _model_serving_container_args,                 env=self._model_serving_container_environment_variables and _model_serving_container_ports

* fix: Ran linter

* fix: Added tests for model_instance_schema_uri, model_parameters_schema_uri and model_prediction_schema_uri

* fix: Fixed env and ports and added tests

* fix: Removed model_labels

* fix: Moved container spec creation into init function

* fix: Fixed docstrings

* fix: Moved import to be alphabetical

* fix: Moved model creation to init function

* fix: Fixed predict_schemata

* fix: simplified predict schemata

* fix: added linter

* fix: Fixed trailing comma

* fix: Removed CustomTrainingJob private fields

* fix: Fixed model tests

* fix: Set managed_model to None

Co-authored-by: Ivan Cheung <[email protected]>

* Fix: refactor class constructor for retrieving resource (#125)

* Added property and abstract method _getter_method and _resource_noun, implemented method _get_gca_resource to class AiPlatformResourceNoun; Added _resource_noun, _getter_method, to Dataset, Model, Endpoint, subclasses of _Job, _TrainingJob, refactored (_)get_* and utils.full_resource_name in class constructor to self._get_gca_resource to Dataset, Model, Endpoint, _Job

* Added return value in _get_gca_resource, added method _sync_gca_resource in AiPlatformResourceNoun class; removed job_type, updated status method with _sync_gca_resource in _Job class

* fix: added return type and lint issues

* fix: merge conflict issue with models.py

* fix: F401 'abc' imported but unused

* chore: merge main into dev (#154)

* test: Dataset integration tests (#126)

* Add dataset.metadata.text to schemas

* Add first integation tests, Dataset class

* Make teardown work if test fails, update asserts

* Change test folder name, enable system tests

* Hide test_base, test_end_to_end for Kokoro CI bug

* Add GCP Project env var to Kokoro presubmit cfg

* Restore presubmit cfg, drop --quiet in unit tests

* Restore test_base, test_end_to_end to find timeout

* Skip tests depending on persistent resources

* Use auth default creds for system tests

* Drop unused import os

* feat: specialized dataset classes, fix: datasets refactor (#153)

* feat: Refactored Dataset by removing intermediate layers

* Added image_dataset and tabular_dataset subclass

* Moved metadata_schema_uri responsibility to subclass to enable forecasting

* Moved validation logic for tabular into Dataset._create_tabular

* Added validation in image_dataset and fixed bounding_box schema error

* Removed import_config

* Fixed metadata_schema_uri

* Fixed import and subclasses

* Added EmptyNontabularDatasource

* change import_metadata to ioformat

* added datasources.py

* added support of multiple gcs_sources

* fix: default (empty) dataset_metadata need to be set to {}, not None

* 1) imported datasources 2) added _support_metadata_schema_uris and _support_import_schema_classes 3) added getter and setter/validation for resource_metadata_schema_uri, metadata_schema_uri, and import_schema_uri 4) fixed request_metadata, data_item_labels 5) encapsulated dataset_metadata, and import_data_configs 6) added datasource configuration logic

* added image_dataset.py and tabular_dataset.py

* fix: refactor - create datasets modeule

* fix: cleanup __init__.py

* fix: data_item_labels

* fix: docstring

* fix:
- changed NonTabularDatasource.dataset_metadata default to None
- updated NonTabularDatasource docstring
- changed gcs_source type hint with Union
- changed _create_and_import to _create_encapsulated with datasource
- removed subclass.__init__ and irrelevant parameters in create

* fix: import the module instead of the classes for datasources

* fix: removed all validation for import_schema_uri

* fix: set parameter default to immutable

* fix: replaced Datasource / DatasourceImportable abstract class instead of a concrete type

* fix: added examples for gcs_source

* fix:
- remove Sequence from utils.py
- refactor datasources.py to _datasources.py
- change docstring format to arg_name (arg_type): convention
- change and include the type signature _supported_metadata_schema_uris
- change _validate_metadata_schema_uri
- refactor _create_encapsulated to _create_and_import
- refactor to module level imports
- add tests for ImageDataset and TabularDataset

* fix: remove all labels

* fix: remove Optional in docstring, add example for bq_source

* test: add import_data raise for tabular dataset test

* fix: refactor datasource creation with create_datasource

* fix: lint

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Add AutoML Image Training Job class (#152)

* Add AutoMLImageTrainingJob, tests, constants

* Address reviewer comments

* feat: Add custom container support (#164)

* chore: merge main into dev (#162)

* fix: suppress no project id warning (#160)

* fix: suppress no project id warning

* fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials

* fix: move default credentials config to credentials property

* fix: add property setter for credentials to avoid everytime reset

* fix: Fixed wrong key value for multilabel (#168)

Co-authored-by: Ivan Cheung <[email protected]>

* feat: Add delete methods, add list_models and undeploy_all for Endpoint class (#165)

* Endpoint list_models, delete, undeploy_all WIP

* Finish delete + undeploy methods, tests

* Add global pool teardowns for test timeout issue

* Address reviewer comments, add async support

* fix: Fixed bug causing training failure for object detection (#171)

Co-authored-by: Ivan Cheung <[email protected]>

* fix: Support intermediary BQ Table for Custom Training (#166)

* chore: add AutoMLImageTrainingJob to aiplatform namespace (#173)

* fix: Unblock build (#174)

* fix: default credentials config related test failures (#167)

* fix: suppress no project id warning

* fix: temporary suppress logging.WARNING and set credentials as google.auth.default credentials

* fix: move default credentials config to credentials property

* fix: add property setter for credentials to avoid everytime reset

* fix: tests for set credentials to default when default not provided

* fix: change credentials with initializer default when not provided in AiPlatformResourceNoun

* fix: use credential mock in tests

* fix: lint

Co-authored-by: sasha-gitg <[email protected]>

* Fix: pass bq_destination to input data config when using training script (#181)

* fix: pass bigquery destination

* fix: add tests and formatting

Co-authored-by: Ivan Cheung <[email protected]>
Co-authored-by: Ivan Cheung <[email protected]>
Co-authored-by: Morgan Du <[email protected]>
Co-authored-by: Vinny Senthil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants