-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PARQUET-2417: Add statistics support to geometry logical type #2971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2417: Add statistics support to geometry logical type #2971
Conversation
This PR is copied form this place: #1379
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
...uet-column/src/main/java/org/apache/parquet/column/statistics/geometry/EnvelopeCovering.java
Outdated
Show resolved
Hide resolved
...uet-column/src/main/java/org/apache/parquet/column/statistics/geometry/EnvelopeCovering.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/BinaryStatistics.java
Outdated
Show resolved
Hide resolved
…e spherical edge is specified.
…apache-parquet-2417-geospatial
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I have left some comments. I think we are reaching the finish line!
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeometryUtils.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/Covering.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/test/java/org/apache/parquet/statistics/TestGeometryTypeRoundTrip.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testEPSG4326BasicReadWriteGeometryValue() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests!
I think we are missing tests in following cases:
- verify geometry type metadata is well preserved.
- verify all kinds of geometry stats are preserved, including bbox, covering and geometry types.
- verify geo stats in the column index have been generated.
I can do these later.
|
Thanks! I will take a look after back from vacation. |
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
- add valid flag to BoundingBox impl - check geostatistics validity when writing to parquet - add tests for mix of valid and invalid geometry update
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
Tested the following cases after the changes: - Merging valid stats with null stats results in invalid stats with null components - Merging null stats with valid stats also results in invalid stats with null components - Merging valid stats with partially null stats (null bounding box) results in the expected behavior where only the bounding box becomes null
|
@wgtmac can you review again? Thank you! |
parquet-column/src/main/java/org/apache/parquet/column/page/PageWriter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/page/PageWriter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
…ry, omit it from stats"
|
@wgtmac Thank you for the very helpful review comments! I believe I’ve addressed them all and have also added and updated some tests. Could you please take another look and let me know if I missed anything or if further changes are needed? Thanks again! |
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick update! Now it generally looks great.
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
...column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-column/src/test/java/org/apache/parquet/column/statistics/geometry/TestBoundingBox.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/BoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/test/java/org/apache/parquet/column/statistics/geometry/TestBoundingBox.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialTypes.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/column/statistics/geometry/GeospatialTypes.java
Outdated
Show resolved
Hide resolved
Thanks for the review! The fixes should be all in now. |
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks!
...lumn/src/main/java/org/apache/parquet/column/statistics/geospatial/GeospatialStatistics.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
|
I just merged it. Thanks @zhangfengcdt for working on this and everyone for the review! |
This PR implements the geo types that are introduced in the new parquet specification.
apache/parquet-format@94b9d63
Jira
Tests
Commits
Documentation