Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(glue): support partition index on tables (#17998)
This PR adds support for creating partition indexes on tables via custom resources. It offers two different ways to create indexes: ```ts // via table definition const table = new glue.Table(this, 'Table', { database, bucket, tableName: 'table', columns, partitionKeys, partitionIndexes: [{ indexName: 'my-index', keyNames: ['month'], }], dataFormat: glue.DataFormat.CSV, }); ``` ```ts // or as a function table.AddPartitionIndex([{ indexName: 'my-other-index', keyNames: ['month', 'year'], }); ``` I also refactored the format of some tests, which is what accounts for the large diff in `test.table.ts`. Motivation: Creating partition indexes on a table is something you can do via the console, but is not an exposed property in cloudformation. In this case, I think it makes sense to support this feature via custom resources as it will significantly reduce the customer pain of either provisioning a custom resource with correct permissions or manually going into the console after resource creation. Supporting this feature allows for synth-time checks and dependency chaining for multiple indexes (reason detailed in the FAQ) which removes a rather sharp edge for users provisioning custom resource indexes themselves. FAQ: Why do we need to chain dependencies between different Partition Index Custom Resources? - Because Glue only allows 1 index to be created or deleted simultaneously per table. Without dependencies the resources will try to create partition indexes simultaneously and the second sdk call with be dropped. Why is it called `partitionIndexes`? Is that really how you pluralize index? - [Yesish](https://www.nasdaq.com/articles/indexes-or-indices-whats-the-deal-2016-05-12). If you hate it it can be `partitionIndices`. Why is `keyNames` of type `string[]` and not `Column[]`? `PartitionKey` is of type `Column[]` and partition indexes must be a subset of partition keys... - This could be a debate. But my argument is that the pattern I see for defining a Table is to define partition keys inline and not declare them each as variables. It would be pretty clunky from a UX perspective: ```ts const key1 = { name: 'mykey', type: glue.Schema.STRING }; const key2 = { name: 'mykey2', type: glue.Schema.STRING }; const key3 = { name: 'mykey3', type: glue.Schema.STRING }; new glue.Table(this, 'table', { database, bucket, tableName: 'table', columns, partitionKeys: [key1, key2, key3], partitionIndexes: [key1, key2], dataFormat: glue.DataFormat.CSV, }); ``` Why are there 2 different checks for having > 3 partition indexes? - It's possible someone decides to define 3 indexes in the definition and then try to add another with `table.addPartitionIndex()`. This would be a nasty deploy time error, its better if it is synth time. It's also possible someone decides to define 4 indexes in the definition. It's better to fast-fail here before we create 3 custom resources. What if I deploy a table, manually add 3 partition indexes, and then try to call `table.addPartitionIndex()` and update the stack? Will that still be a synth time failure? - Sorry, no. Why do we need to generate names? - We don't. I just thought it would be helpful. Why is `grantToUnderlyingResources` public? - I thought it would be helpful. Some permissions need to be added to the table, the database, and the catalog. Closes #17589. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
- Loading branch information