-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ML Model Backend Implementation #1896
feat: ML Model Backend Implementation #1896
Conversation
Update to master
Update To Master
Merge Master
Update to LI Master
gms/client/src/main/java/com/linkedin/common/client/MLModelsClient.java
Outdated
Show resolved
Hide resolved
gms/impl/src/main/java/com/linkedin/metadata/resources/ml/MLModels.java
Outdated
Show resolved
Hide resolved
gms/impl/src/main/java/com/linkedin/metadata/resources/ml/OwnershipResource.java
Outdated
Show resolved
Hide resolved
gms/factories/src/main/java/com/linkedin/ml/factory/MLModelDAOFactory.java
Outdated
Show resolved
Hide resolved
Update to master
Build is failing due to a 502 bad gateway when reaching out to the gradle maven repo for ElasticSearch's rest client. Should pass now after the fix from a typo I made (if it can reach all the dependencies). Not sure how to force a rebuild without making more changes |
Where was the typo? Did you already commit it? We ran the build jobs for your last few commits (see the Xs or check mars on them). |
Yeah the typo fix was the last commit. The previous failure was due to the restspec not having EvaluationData endpoints because I typoed the text in the Rest.li annotation for EvaluationDataResource which is now fixed. The current build failure is due to this error which does not seem to be code related, it hit a gateway error when reaching out to the Gradle repository to get a dependency:
|
Sorry but just noticed that all the ML-related models are placed under |
@mars-lan Not at all, go for it. I'll update as needed 😄 |
Actually nvm. Seems like the models have been referenced at multiple places, changing the namespace will quickly become a backward incompatible change. Let's live with them for now. |
final List<EvaluatedOn> evaluationDataList = evaluationData.getEvaluationData() | ||
.stream() | ||
.filter(BaseData::hasDataset) | ||
.filter(baseData -> DatasetUrn.ENTITY_TYPE.equals(baseData.getDataset().getEntityType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? since dataset
field is of type DatasetUrn
, shouldn't the entity type be the same as that of DatasetUrn
?
public <URN extends Urn> List<GraphBuilder.RelationshipUpdates> buildRelationships(@Nonnull URN urn, @Nonnull EvaluationData evaluationData) { | ||
final List<EvaluatedOn> evaluationDataList = evaluationData.getEvaluationData() | ||
.stream() | ||
.filter(BaseData::hasDataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset
seems to be a required field in BaseData
model. Do we really need the check hasDataset
?
/** | ||
* How was the data preprocessed for evaluation (e.g., tokenization of sentences, cropping of images, any filtering such as dropping images without faces)? | ||
*/ | ||
preProcessing: optional string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you share a usecase where storing this property in the edge will be useful? I wouldn't recommend adding this field as an edge property.
cc @keremsahin1 @camelliazhang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RyanHolstien Thanks for your contribution! Could you please help me to understand this relationship better?
- Could you please provide a few examples?
- How do the queries look like when this relationship is involved? Are you trying to do a filter based on "preProcessing" property? How many possible values are we talking about? In general, I won't recommend to add an arbitrary string property on an edge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it. My thought was essentially that it would be useful information to return as a result of searching for the relationship or perhaps filtering on certain types of preprocessing that had occurred similar to how OwnedBy has OwnershipType. If the issue is that it's a arbitrary string I don't currently have a list of reasonable preprocessing examples for an enum so we can leave it off and add it later if it's worthwhile.
/** | ||
* How was the data preprocessed for evaluation (e.g., tokenization of sentences, cropping of images, any filtering such as dropping images without faces)? | ||
*/ | ||
preProcessing: optional string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before. Don't think we need to store this property in the edge.
|
||
import static com.linkedin.metadata.dao.internal.BaseGraphWriterDAO.RemovalOption.REMOVE_ALL_EDGES_FROM_SOURCE; | ||
|
||
public class TrainedOnBuilderFromTrainingData extends BaseRelationshipBuilder<TrainingData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comments as provided in EvaluatedOnBuilderFromEvaluationData
.
private MLModelDocument getDocumentToUpdateFromAspect(MLModelUrn urn, EvaluationData evaluationData) { | ||
final MLModelDocument doc = new MLModelDocument(); | ||
|
||
if (evaluationData.hasEvaluationData()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more of a generic comment. hasX()
doesn't guarantee that getX()
will not be null. I suggest we replace this with (evaluationData.getEvaluationData() != null)
. If you can make similar changes in other parts of the code that will be great! Related discussion #1950 (comment)
metadata-builders/src/main/java/com/linkedin/metadata/builders/search/MLModelIndexBuilder.java
Show resolved
Hide resolved
…o align with updates
Sorry about the delay, made the requested changes and updated the code to reflect changes to master over time. Let me know if there's anything else needed to get this merged in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RyanHolstien Thanks for taking time addressing all the review comments!
/** | ||
* Data model for a ML Model entity | ||
*/ | ||
record MLModelEntity includes BaseEntity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the entity model for graph. Could you please share some context on this? Such as a list of main graph queries to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current graph use cases for ML Models include querying models that have been trained/evaluated on certain datasets. I created the entity model based on the other current entity models which are rather minimalistic with just the URN and URN components. The main value is the edge relationships between datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your information @RyanHolstien
I do see the search APIs (via ESSearchDAO) are implemented but not Neo4jQueryDAO. I assume you will have a follow up for building some APIs that leverages query DAO (for graph)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a separate API & service layer diverging from Rest.li in our fork, so changes from there aren't directly transferrable. If no one else picks it up before I have time I can do the follow up mapping it over to the equivalent Rest.li based resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, SGTM.
"platform^0.055", | ||
"type^0.01" | ||
], | ||
"default_operator": "OR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me now. My suggestion is to use one operator as default operator query string, either OR or AND, to reduce the possible confusions from users. With query_string, user can always specify and overwrite the default operator. Full syntax can be found here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @RyanHolstien for your contributions and persistence! Great work here!
Thanks for taking the time to review @camelliazhang ! I'm not sure what the build error is about, the failing test passes locally. The error is saying that the number of arguments doesn't line up with the constructor call in my new search sanity test, but it is structured the same way as the other tests. I don't have access to run it again without pushing another commit. I can try pushing a harmless commit like adding a local variable for one of the Strings in that test class to see if it passes the build so this can be merged if that's the only way to trigger another build. |
@jplaisted Could you please take a look for the failure of new search sanity test? Thanks |
Yeah I added a new parameter, which I acknowledges breaks code, but at the time i thought one no one was using it besides me. It is weird to me it passes locally but not on the CI. I'd think it'd fail on both. I don't think CI tries to rebase on master before running changes? In any case, make sure you've rebased on master, and run Check out this: #2067. Basically there's some new sanity tests to test your config too, just pass the super constructor the search config. |
@RyanHolstien It would be great if the GMS sample API calls section in the README is updated for |
Update to master
Update to master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Thoughts on making the L lower case? MlModel
?
https://google.github.io/styleguide/javaguide.html#s5.3-camel-case
We don't use google's style guide, but still a useful reference.
Since the ML here is a short-form for Machine Learning ... maybe MLModel is more appropriate. |
Looks like Jyoti's concerns are addressed now.
Thanks @RyanHolstien for the lift and @camelliazhang, @jywadhwani for detailed reviews! |
Fixes #1877
Checklist