Skip to content

Conversation

@zhangfengcdt
Copy link
Member

@zhangfengcdt zhangfengcdt commented Jul 26, 2024

This PR implements the geo types that are introduced in the new parquet specification.
apache/parquet-format@94b9d63

Jira

Tests

  • My PR adds the following unit tests: TestGeometryTypeRoundTrip

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@zhangfengcdt zhangfengcdt marked this pull request as draft July 26, 2024 15:39
@zhangfengcdt
Copy link
Member Author

CC: @jiayuasu @Kontinuation @wgtmac

@zhangfengcdt zhangfengcdt marked this pull request as ready for review August 12, 2024 15:57
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I have left some comments. I think we are reaching the finish line!

}

@Test
public void testEPSG4326BasicReadWriteGeometryValue() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these tests!

I think we are missing tests in following cases:

  • verify geometry type metadata is well preserved.
  • verify all kinds of geometry stats are preserved, including bbox, covering and geometry types.
  • verify geo stats in the column index have been generated.

I can do these later.

@wgtmac
Copy link
Member

wgtmac commented Apr 30, 2025

Thanks! I will take a look after back from vacation.

- add valid flag to BoundingBox impl
- check geostatistics validity when writing to parquet
- add tests for mix of valid and invalid geometry update
Tested the following cases after the changes:
- Merging valid stats with null stats results in invalid stats with null components
- Merging null stats with valid stats also results in invalid stats with null components
- Merging valid stats with partially null stats (null bounding box) results in the expected behavior where only the bounding box becomes null
@jiayuasu
Copy link
Member

@wgtmac can you review again? Thank you!

@zhangfengcdt
Copy link
Member Author

@wgtmac Thank you for the very helpful review comments! I believe I’ve addressed them all and have also added and updated some tests. Could you please take another look and let me know if I missed anything or if further changes are needed? Thanks again!

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick update! Now it generally looks great.

@zhangfengcdt
Copy link
Member Author

Thanks for the quick update! Now it generally looks great.

Thanks for the review! The fixes should be all in now.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks!

@wgtmac wgtmac changed the title PARQUET-2417: Add support for geometry logical type PARQUET-2417: Add statistics support to geometry logical type May 16, 2025
@wgtmac wgtmac merged commit 22a23d0 into apache:master May 18, 2025
7 checks passed
@wgtmac
Copy link
Member

wgtmac commented May 18, 2025

I just merged it. Thanks @zhangfengcdt for working on this and everyone for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants