Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests(datahub-client): new tests for the AvroSchemaConverter #12087

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sgomezvillamor
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Dec 10, 2024
assertSchemaField(
schema.getFields().get(1),
"[version=2.0].[type=MapType].[type=map].[type=ComplexType].mapOfComplexType",
"ComplexType",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should native type be map<string,ComplexType>?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense

Comment on lines +186 to +202
assertSchemaField(
schema.getFields().get(4),
"[version=2.0].[type=MapType].[type=map].[type=union].mapOfNullableString",
"union",
false,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new MapType().setKeyType("string").setValueType("union"))));
assertSchemaField(
schema.getFields().get(5),
"[version=2.0].[type=MapType].[type=map].[type=union].[type=string].mapOfNullableString",
"string",
false,
false,
new SchemaFieldDataType().setType(SchemaFieldDataType.Type.create(new StringType())));
Copy link
Contributor Author

@sgomezvillamor sgomezvillamor Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro schema

    {
      "name": "mapOfNullableString",
      "type": {
        "type": "map",
        "values": ["null", "string"]
      }
    },

the native type is union, no reference to map
expectedNullable for the value is false instead of true

it's a union but only one tested field 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be nullable, and native type maybe should be union<null, string>

Comment on lines +234 to +243
assertSchemaField(
schema.getFields().get(10),
"[version=2.0].[type=MapType].[type=map].[type=array].mapOfArray",
"array(string)",
false,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new ArrayType().setNestedType(new StringArray("string")))));
Copy link
Contributor Author

@sgomezvillamor sgomezvillamor Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro schema

    {
      "name": "mapOfArray",
      "type": {
        "type": "map",
        "values": {
          "type": "array",
          "items": "string"
        }
      }
    },

native type should be map of string array

and the schemafielddatatype is also missing references to MapType

        new SchemaFieldDataType()
            .setType(
                SchemaFieldDataType.Type.create(
                    new MapType().setKeyType("string").setValueType(<XXXX>))));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I think you are right

Comment on lines +244 to +253
assertSchemaField(
schema.getFields().get(11),
"[version=2.0].[type=MapType].[type=map].[type=map].mapOfMap",
"map<string,int>",
false,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new MapType().setKeyType("string").setValueType("int"))));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro schema

    {
      "name": "mapOfMap",
      "type": {
        "type": "map",
        "values": {
          "type": "map",
          "values": "int"
        }
      }
    },

map of <string, int> !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right

Comment on lines +504 to +513
assertSchemaField(
schema.getFields().get(9),
"[version=2.0].[type=StructType].[type=ComplexStruct].structField.[type=union].fieldUnion",
"union",
true,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new UnionType().setNestedTypes(new StringArray("union")))));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering this comes from

          {
            "name": "fieldUnion",
            "type": [
              "null",
              "string",
              "int"
            ]
          },

I would expect something like

Suggested change
assertSchemaField(
schema.getFields().get(9),
"[version=2.0].[type=StructType].[type=ComplexStruct].structField.[type=union].fieldUnion",
"union",
true,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new UnionType().setNestedTypes(new StringArray("union")))));
assertSchemaField(
schema.getFields().get(9),
"[version=2.0].[type=StructType].[type=ComplexStruct].structField.[type=union].fieldUnion",
"union<string, int>",
true,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new UnionType().setNestedTypes(new StringArray("string", "int")))));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be union<null, string, int> as native type as it is nullable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 508 is about nullable flag
but yes, union<null, string, int> would also look good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I feel like in Avro this is the actual native type

Comment on lines +886 to +892
assertSchemaField(
schema.getFields().get(18),
"[version=2.0].[type=users_record].[type=map].[type=int].props",
"int",
true,
false,
new SchemaFieldDataType().setType(SchemaFieldDataType.Type.create(new NumberType())));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avro schema

    {
      "name": "props",
      "type": [
        "null",
        {
          "type": "map",
          "values": [
            "null",
            "int"
          ]
        }
      ],
      "default": null
    },

so this should be

Suggested change
assertSchemaField(
schema.getFields().get(18),
"[version=2.0].[type=users_record].[type=map].[type=int].props",
"int",
true,
false,
new SchemaFieldDataType().setType(SchemaFieldDataType.Type.create(new NumberType())));
assertSchemaField(
schema.getFields().get(18),
"[version=2.0].[type=users_record].[type=map].[type=int].props",
"map<string, int>",
true,
false,
new SchemaFieldDataType()
.setType(
SchemaFieldDataType.Type.create(
new MapType().setKeyType("string").setValueType("int"))));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, correctly it should be your suggested change

@sgomezvillamor sgomezvillamor marked this pull request as ready for review December 12, 2024 10:15
@datahub-cyborg datahub-cyborg bot added the pending-submitter-response Issue/request has been reviewed but requires a response from the submitter label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-response Issue/request has been reviewed but requires a response from the submitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants