Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MLmodel Graphql Query #2166

Merged
merged 11 commits into from
Mar 13, 2021

Conversation

arunvasudevan
Copy link
Collaborator

This PR includes changes in datahub-graphql-core to include GraphQL Query functionality changes for MLModel Entity.

Includes fixes in GMS based on Tests for MLModel
Also organized mappers into packages by entity.

Checklist

  • [ X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • [ X] Tests for the changes have been added/updated (if applicable)
  • [ X] Docs related to the changes have been added/updated (if applicable)

@shirshanka
Copy link
Contributor

@jjoyce0510 : could you take a look at this?

@arunvasudevan arunvasudevan force-pushed the mlmodel_graphql_query branch from 5fc4f9b to 7a3cc71 Compare March 5, 2021 20:32

```
{
mlmodel(urn: "urn:li:mlModel:(urn:li:dataPlatform:science,scienceModel,PROD)") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we name this Query "mlModel"?

}
}
}
mlModelProperties {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we rename this to "properties"?

primaryUsers
outOfScopeUses
}
mlModelFactorPrompts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename this to factorPrompts?

@@ -91,6 +93,17 @@ public static DataPlatforms getDataPlatformsClient() {
return _dataPlatforms;
}

public static MLModels getMlModelsClient() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: getMLModelsClient

@@ -91,6 +93,17 @@ public static DataPlatforms getDataPlatformsClient() {
return _dataPlatforms;
}

public static MLModels getMlModelsClient() {
if (_mlmodels == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit _mlModels

} else if (env.getObject() instanceof FloatBox) {
return env.getSchema().getObjectType(FLOAT_BOX);
} else {
throw new RuntimeException("Unrecognized object type provided to type resolver");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the env.getObject() string in the error message for help debugging


@Override
public GraphQLObjectType getType(TypeResolutionEnvironment env) {
if (env.getObject() instanceof StringBox) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public Cost apply(com.linkedin.common.Cost cost) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cost common? If not, we can create a new ML models-specific directory to contain these mappers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Cost is part of common. Internally we added Cost as an Aspect to Dataset too.


@Override
public CostValue apply(com.linkedin.common.CostValue costValue) {
CostValue result = new CostValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: "final"

}

if (datasetUpdateInput.getDeprecation() != null) {
final DatasetDeprecation deprecation = new DatasetDeprecation();
deprecation.setDeprecated(datasetUpdateInput.getDeprecation().getDeprecated());
deprecation.setDecommissionTime(datasetUpdateInput.getDeprecation().getDecommissionTime());
if (datasetUpdateInput.getDeprecation().getDecommissionTime() != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this null check required? If decommission time = null, then we will just set the value of DatasetDeprecation.decomissionTime to null also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Case of Mutations if the user does not provide decommisionTime in the input then trying to read it would throw an NPE.
It would work in case of queries but for mutations null checks needs to be made for Optional fields.

* Maps Pegasus {@link RecordTemplate} objects to objects conforming to the GQL schema.
*
*/
public class MlModelMapper implements ModelMapper<com.linkedin.ml.MLModel, MLModel> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: Let's name "MLModelMapper" to keep the capitalization consistent across model & code


@Override
public BrowseResults browse(@Nonnull List<String> path,
@Nullable List<FacetFilterInput> filters,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same :)

@Nullable List<FacetFilterInput> filters,
int start,
int count,
@Nonnull final QueryContext context) throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to implement browse in the client within this pr? If not I'd suggest making this a SearchableEntityType only, until that functionality gets in


import com.linkedin.common.urn.MLModelUrn;

public class MlModelUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MLModelUtils" to be consistent

public CaveatDetails apply(@NonNull final com.linkedin.ml.metadata.CaveatDetails input) {
final CaveatDetails result = new CaveatDetails();
if (input.getCaveatDescription() != null) {
result.setCaveatDescription(input.getCaveatDescription());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these null checks required? These can be skipped if there's no risk of a NullPointerException being thrown

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(& if the destination field is optional in the GQL schema)


@Override
public EthicalConsiderations apply(@NonNull final com.linkedin.ml.metadata.EthicalConsiderations ethicalConsiderations) {
EthicalConsiderations result = new EthicalConsiderations();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: final

@Override
public MLModelProperties apply(@NonNull final com.linkedin.ml.metadata.MLModelProperties mlModelProperties) {
final MLModelProperties result = new MLModelProperties();
if (mlModelProperties.getDate() != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these null checks required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it for Date, Description and Type for other Objects throws NPE


@Override
public MLModel apply(@Nonnull final com.linkedin.ml.MLModel mlModel) {
MLModel result = new MLModel();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: final

result.setType(EntityType.MLMODEL);
result.setName(mlModel.getName());
result.setDescription(mlModel.getDescription());
result.setOrigin(Enum.valueOf(FabricType.class, mlModel.getOrigin().name()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use something like:

FabricType.valueOf(ml.getOrigin().getName())

here

result.setMetrics(MetricsMapper.map(mlModel.getMetrics()));
}
if (mlModel.getEvaluationData() != null) {
result.setEvaluationData(mlModel.getEvaluationData().getEvaluationData().stream().map(BaseDataMapper::map).collect(Collectors.toList()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both getEvaluationData() and getEvaluationData().getEvaluationData() will not return null, right? Otherwise risk of NPE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getEvaluationData().getEvaluationData() does not return null it is a mandatory field - https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/ml/metadata/EvaluationData.pdl#L13

result.setEvaluationData(mlModel.getEvaluationData().getEvaluationData().stream().map(BaseDataMapper::map).collect(Collectors.toList()));
}
if (mlModel.getTrainingData() != null) {
result.setTrainingData(mlModel.getTrainingData().getTrainingData().stream().map(BaseDataMapper::map).collect(Collectors.toList()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Details of Data Proprocessing
"""
preProcessing: [String]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[String!]

Measures of MLModel performance
"""
performanceMeasures: [String]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[String!]

Decision Thresholds used (if any)?
"""
decisionThreshold: [String]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[String!]

Primary Use cases for the model.
"""
primaryUses: [String]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[String!]

Out of scope uses of the MLModel
"""
outOfScopeUses: [String]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[String!]

Primary Intended Users
"""
primaryUsers: [IntendedUserType]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[IntendedUserType!]


enum IntendedUserType {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any descriptions available for these?

Source Code along with types
"""
sourceCode: [SourceCodeUrl]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SourceCodeUrl!] - --- Should this field be optional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its not. Added !

}

type SourceCodeUrl {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think these should be optional. Use !


enum SourceCodeUrlType {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any descriptions available?

}

type Cost {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be optional. Use !

}

type CostValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CostValue is either an ID or a Code. So this is optional

type Deprecation {
"""
Whether the dataset has been deprecated by owner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset? We should replace this with "Whether the entity has been deprecated by owner"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for comments below :)

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks really great. Mostly modeling comments, with some other minor nitpicks.

Thanks for making our codebase cleaner along the way!

@arunvasudevan arunvasudevan force-pushed the mlmodel_graphql_query branch from 7a3cc71 to 796f761 Compare March 12, 2021 16:12
@arunvasudevan
Copy link
Collaborator Author

Thanks for taking time to review @jjoyce0510.
Let me know what do you think.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 7750c61 into datahub-project:master Mar 13, 2021
@arunvasudevan arunvasudevan deleted the mlmodel_graphql_query branch March 13, 2021 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants