-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DynamoDB documentClient support #1223
Comments
It's a duplicate, but the other issue was closed so I'm assuming this is the best place to follow progress of this? @AllanFly120 is there somewhere else tracking this? |
Hey folks, for those struggling with this I've created an npm package to convert JSON to DynamoDB AttributeValue type schema JSON and back again. I'm using it today in a small production project. I've created two packages, one a CommonJS module, and the other a ES module. You can find them, with docs here. It's a fork and update from the DocumentClient converter.js code. CommonJS module - ES module - Source code here: https://github.com/senzodev/dynamodb-type-marshall |
Thanks @antstanley for creating a package for marshalling and unmarshalling. We've implemented marshaller (convertToAttr) in #1531, and aiming to push it in |
The support for marshall and unmarshall operations was released in gamma.10 release, and verified with the code below: Versions tested on:
Codeconst { DynamoDB } = require("@aws-sdk/client-dynamodb");
const { marshall, unmarshall } = require("@aws-sdk/util-dynamodb");
const { deepStrictEqual } = require("assert");
(async () => {
const region = "us-west-2";
// Table with Partition key named `HashKey`
const TableName = `test-util-dynamodb`;
const HashKey = "hashKey";
const input = {
HashKey,
BinaryAttribute: Uint8Array.from("12345"),
BoolAttribute: true,
BoolSetAttribute: new Set([Uint8Array.from("1"), Uint8Array.from("2")]),
ListAttribute: [1, "two", false],
MapAttribute: { foo: "bar" },
NumAttribute: 1,
NumSetAttribute: new Set([1, 2, 3, 4, 5]),
NullAttribute: null,
StrAttribute: "one",
StrSetAttribute: new Set(["one", "two", "three"]),
};
const Item = marshall(input);
console.log({ putItem: input });
const client = new DynamoDB({
region,
// logger: { ...console, debug: undefined },
});
await client.putItem({ TableName, Item });
const response = await client.getItem({
TableName,
Key: marshall({ HashKey }),
});
console.log({ getItem: unmarshall(response.Item) });
deepStrictEqual(input, unmarshall(response.Item));
})(); Output{
putItem: {
HashKey: 'hashKey',
BinaryAttribute: Uint8Array(5) [ 1, 2, 3, 4, 5 ],
BoolAttribute: true,
BoolSetAttribute: Set { [Uint8Array], [Uint8Array] },
ListAttribute: [ 1, 'two', false ],
MapAttribute: { foo: 'bar' },
NumAttribute: 1,
NumSetAttribute: Set { 1, 2, 3, 4, 5 },
NullAttribute: null,
StrAttribute: 'one',
StrSetAttribute: Set { 'one', 'two', 'three' }
}
}
{
getItem: {
ListAttribute: [ 1, 'two', false ],
MapAttribute: { foo: 'bar' },
BinaryAttribute: Uint8Array(5) [ 1, 2, 3, 4, 5 ],
StrAttribute: 'one',
BoolSetAttribute: Set { [Uint8Array], [Uint8Array] },
NumSetAttribute: Set { 5, 4, 3, 2, 1 },
BoolAttribute: true,
StrSetAttribute: Set { 'one', 'three', 'two' },
HashKey: 'hashKey',
NumAttribute: 1,
NullAttribute: null
}
} If you come across any bugs, do report them by creating an issue: https://github.com/aws/aws-sdk-js-v3/issues/new?assignees=&labels=bug+report&template=---bug-report.md&title= |
Awesome! Will give it a go today. |
Closing as the core operations, i.e. marshall/unmarshall, were released in gamma.10 release. |
@trivikr will you be providing guidance on how to migrate from DocumentClient? Would be good to see how easy it is to go from DocumentClient to marshall/unmarshall |
The utility operations marshall and unmarshall are exposed in const { DynamoDB } = require("@aws-sdk/client-dynamodb");
const { marshall } = require("@aws-sdk/util-dynamodb");
const client = new DynamoDB(clientParams);
const putParams = {
TableName: "Table",
Item: marshall({
HashKey: "hashKey",
NumAttribute: 1,
BoolAttribute: true,
ListAttribute: [1, "two", false],
MapAttribute: { foo: "bar" },
NullAttribute: null,
}),
};
await client.putItem(putParams);
const getParams = {
TableName: "Table",
Key: marshall({
HashKey: "hashKey",
}),
};
const { Item } = await client.getItem(getParams);
unmarshall(Item); The examples are also provided in README at: https://github.com/aws/aws-sdk-js-v3/tree/v1.0.0-rc.3/packages/util-dynamodb |
Does it not make sense for a DynamoDB JavaScript library to provide the DocumentClient functionality? I mean, under what circumstances in JavaScript/TypeScript would NOT use native JS objects and would want to use the marshalled DynamoDB types? I'd hazard a guess to say almost never. So, not providing a DocumentClient seems to be a straight annoyance for most SDK users, because, chances are, I've started to port our existing TS code from aws sdk v2 to v3 where we use DocumentClient. Fortunately, we have an abstraction layer, so we are able to marshall/unmarshall there without affecting calling code. Unfortunately, we have to intercept and manually sprinkle in marshall/unmarshall everywhere now, which I suspect many developers porting would have to do as well. I'm still going through the motions of doing the marshalling/unmarshalling - but my 2 cents is that it's an oversight and disservice by not providing a DocumentClient. If the .NET/Java library required developers to manually marshall/unmarshall, I'm quite sure everyone's heads will explode 😉. |
Reopening the issue to explore addition of DocumentClient. |
+1 |
I don't really know how safe this is, but I'm now using this to auto-marshall: client.middlewareStack.add(
(next) => async (args) =>
next({
...args,
input: Object.entries(args.input).reduce(
(acc, [key, value]) => ({
...acc,
[key]: ['ExpressionAttributeValues', 'Key', 'Item'].includes(key)
? marshall(value, { convertEmptyValues: true })
: value
}),
{}
)
}),
{
name: 'autoMarshall',
priority: 'high',
step: 'initialize'
}
);
client.middlewareStack.add(
(next) => async (args) => {
const result = await next(args);
return {
...result,
output: {
...result.output,
Item: result.output.Item && unmarshall(result.output.Item),
Items: result.output.Items && result.output.Items.map((i) => unmarshall(i))
}
};
},
{
name: 'autoUnmarshall',
priority: 'high',
step: 'deserialize'
}
); where |
Glad to have the marshall/unmarshall option. However, the V2 DocumentClient was easier to work with. So I would also love to see it migrated to V3. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
any updates on this?? |
Updates in
We're not planning to support a separate DocumentClient in v3 as of now. |
@trivikr The downside of having to manually marshall/unmarshall is that the user now has to keep track of which params need to be marshalled/unmarshalled. IMO, that's too much internal architecture knowledge to require from consumers. If there's to be no DocumentClient in v3, how about adding a first-party middleware for auto-marshall/unmarshall similar to what @eryon proposed? |
A separate DocumentClient was not written for v3 mainly because of the difference in types: We also evaluated automatically detecting whether customer code has sent an attribute value or native JavaScript types based on the actual data. But it had multiple edge cases. Another option we evaluated is allowing customer to configure whether DynamoDB records or native JavaScript values are passed in an operation. But that option also had some edge cases, and added overhead for customer to understand the configuration. I can provide specific details if required. After a week long design discussion, we decided that the utility functions in @cadam11 Do you have a suggestion for a middleware |
I guess it depends on how strict you are about
At the very least, I'd expect that for a middleware to work there would need to be some (non-breaking) additions to the existing types. E.g., Wish I had more time to dig into this interesting problem. In any case, I'm still a fan of v3 sdk. |
@trivikr That doesn't make a whole lot of sense. Your first paragraph lays out the difficulty in doing this work, and your penultimate paragraph explains how you decided that its better that every consumer of your library do that work instead. That's what a library is for: to do the difficult work that everyone needs once. You recognize that it's awkward for the consumer of the unmarshalled data to work with the underlying types. That's the reason that you should do it! |
The first paragraph explains difficulty in doing this work when it comes to being an SDK owner. This involves writing and maintaining a separate DocumentClient and emitting mostly redundant type definitions. The penultimate paragraph explains how easy it for customer to just use marshall/unmarshall functions which takes native JavaScript types as inputs, and return DynamoDB records as output. This way:
Can you create a separate feature request on what specific difficulties you're facing when moving from DynamoDB DocumentClient in v2 to utilities in v3? I'm writing a blog post on DynamoDB utilities, and aiming to publish in a month. I'll include some scripts to ease migration based on the problems faced. |
The difficulty is writing the same marhsall/unmarshall code and maintaining those "mostly redundant types" either in every app that uses the aws sdk or in our own library that provides the same. This is a JavaScript library that doesn't work with JavaScript objects! What JavaScript app wants to interact with those DynamoDB marshalled objects? I'm guessing next to none. Why not abstract away the transport format in the library that interacts with the transport layer?! |
Just FYI: Previously I said it would be really helpful to have DocumentClient supported for use in our OneTable library. However, after digging deeper, we were able to support the V3 SDK pretty easily without this and we just released an update of OneTable with V3 support. The marshall / unmarshall library did most of the work and we were able to maintain 100% API compatibility for OneTable users after initialization of the library. Users have 2 code lines of difference in using V2 or V3 of the SDK. All smiles here. So while DocumentClient support doesn't matter for us now, it do believe that for many developers, having a compatibility shim so they don't need to rewrite their DynamoDB apps that use DocumentClient would be worthwhile. |
If it helps, here's a type-based import { AttributeValue, DynamoDB } from '@aws-sdk/client-dynamodb';
type GenericSMember<S extends string> = AttributeValue.SMember & { S: S };
type MarshallProperty<T> = T extends string
? GenericSMember<T>
: T extends number
? AttributeValue.NMember
: T extends boolean
? AttributeValue.BOOLMember
: T extends Uint8Array
? AttributeValue.BMember
: T extends Set<string>
? AttributeValue.SSMember
: T extends Set<number>
? AttributeValue.NSMember
: T extends Set<Uint8Array>
? AttributeValue.BSMember
: T extends null
? AttributeValue.NULLMember
: T extends Array<infer TItems>
? MarshallProperty<TItems>
: T extends Record<infer TKeys, unknown>
? MarshallProperty<TKeys>
: never;
type Marshall<T extends object> = { [K in keyof T]: MarshallProperty<T[K]> }; Example usage: interface PullRequestOutcome {
pk: 'PullRequest';
sk: `${string}${string}#${string}${string}`;
host: 'github' | 'bitbucket';
conclusion: 'merged' | 'declined';
openedAt: string;
closedAt: string;
}
const Item: Marshall<PullRequestOutcome> = {
pk: { S: '' },
sk: { S: 'my-project#12' },
host: { S: 'github' },
conclusion: { S: 'declined' },
openedAt: { S: '' },
closedAt: { S: '' }
}; It should work fine for all things except for tuples (as they can't be preserved properly, so they'll just become I've also got an type UnmarshallProperty<T> = T extends GenericSMember<infer S>
? S
: T extends AttributeValue.NMember
? number
: T extends AttributeValue.BOOLMember
? boolean
: T extends AttributeValue.BMember
? Uint8Array
: T extends AttributeValue.SSMember
? Set<string>
: T extends AttributeValue.NSMember
? Set<number>
: T extends AttributeValue.NULLMember
? null
: T extends Record<string, unknown>
? UnmarshallProperty<T>
: never;
type Unmarshall<T extends object> = {
[K in keyof T]: UnmarshallProperty<T[K]>;
};
declare const unmarshall: <T extends { [key: string]: AttributeValue }>(
item: T
) => Unmarshall<T>;
const item = unmarshall({
pk: { S: 'PullRequest' as const },
sk: { S: 'my-project#12' },
host: { S: 'github' },
conclusion: { S: 'declined' },
openedAt: { S: '' },
closedAt: { S: '' },
}); |
Update: After the design discussion within the team, a PoC for DocumentClient is posted at #2062 This is a manually written PoC. In the final PR, the operations are going to be code generated and not manually written. The modular DynamoDBDocumentClient can be created as follows: const client = new DynamoDBClient({});
const ddbDocClient = DynamoDBDocumentClient.from(client);
await ddbDocClient.send(
new PutCommand({
TableName,
Item: { id: "1", content: "content from DynamoDBDocumentClient" },
})
); The v2 backward-compatible DocumentClient can be created as follows: const client = new DynamoDBClient({});
const ddbDoc = DynamoDBDocument.from(client);
await ddbDoc.put({
TableName,
Item: { id: "2", content: "content from DynamoDBDocument" },
}); |
Update: The existing marshalling/unmarshalling code for BatchGet/BatchWrite is pretty complex. The code would be black box to users, but they may go through it while debugging. You can view the current implementation for posting suggestions/comments at https://github.com/trivikr/aws-sdk-js-v3/blob/058c2078a078a8f1e3d27a34b10955f4a27752df/lib/lib-dynamodb/src/commands/BatchWriteCommand.ts The PoC PR can be viewed at #2062 Tagging @kyeotic as they had attempted writing marshall/unmarshall for batchWrite earlier in #1223 (comment) |
@trivikr That implementation looks pretty heavy, memory and computation wise. Using an object spread in a reduce, like this one UnprocessedItems: Object.entries(data.output.UnprocessedItems).reduce(
(acc: any, [key, value]: [string, any]) => ({
...acc,
[key]: value.map((writeItem: any) => ({
...writeItem,
...(writeItem.PutRequest && {
PutRequest: unmarshall(
writeItem.PutRequest,
configuration.translateConfiguration?.unmarshallOptions
),
}),
...(writeItem.DeleteRequest && {
DeleteRequest: unmarshall(
writeItem.DeleteRequest,
configuration.translateConfiguration?.unmarshallOptions
),
}),
})),
}),
{}
),
}), Results in a copy of the param object for every PutRequest/DeleteRequest, each of which is simply re-spread and discarded except for the last. I prefer the response.UnprocessedItems &&
Object.fromEntries(
Object.entries(response.UnprocessedItems).map(
([table, requests]: [string, WriteRequest[]]) => [
table,
requests.map((request) => ({
PutRequest: request.PutRequest &&
request.PutRequest.Item && {
Item: this.unmarshall(request.PutRequest.Item),
},
DeleteRequest: request.DeleteRequest &&
request.DeleteRequest.Key && {
Key: this.unmarshall(request.DeleteRequest.Key),
},
})),
]
)
) As far as complexity, I think that's all the more reason to do it. This code was very hard to get right, at least in Typescript, and it's fully generic. I'd rather that happen here in SDK, with 1000 eyes on it, than ad hoc in 1000 invisible apps. |
I agree. The
Yup, the AWS SDK for JavaScript team is committed to providing support for DocumentClient. I'm posting updates on this issue to get feedback while DocumentClient gets implemented. |
That's fair, though its easy to write a basic polyfill for |
It's true that reduce+spread is wasteful, but we'll continue to use it because of the following reasons:
Benchmark used https://www.measurethat.net/Benchmarks/Show/11806/0/reduce-vs-map-fromentries-reduce-spread |
Echoing @kyeotic's concerns here – we've seen this pattern have a noticeable negative effect in Node.js 12.x Lambda environment. It's used by some GraphQL libraries and was adding hundreds of milliseconds to our cold starts. We had to fork and rewrite those components to use a more imperative approach – it's really not much more code and it had a noticeable effect. When you say "once per service call" do you mean for every DynamoDB call? If there are thousands of these a second, won't that have an impact? |
Yes. The hot path of the SDK.
It would have an impact, as marshalling/unmarshalling would then be in the hot path of the application.
Is this issue documented somewhere? I agree that it's not much code, as seen in this experimental commit trivikr@1ad4dff |
Just digging up the code now – don't think we created an issue for it. Here are some of the pieces we had to change: We just rewrote it to use for loops, settings keys on a single object. I believe the main issue was the GC cutting in all the time to clean up short-lived objects. We have a fairly big GraphQL schema, so it may be that others haven't seen the same impact we have. |
Was a while ago (like, 9-12 mths ish 😊), but I think we used flamegraphs generated by https://github.com/davidmarkclements/0x to track this code down as one of the main offenders |
Uh, what? How is "once-per-service-call" not in the hot path? What could possibly be more hot than "every single call to Dynamo"?
Improved, but it is still measurably worse than using a map followed by a single object construction.
Map also creates new objects, and does not mutate the original. I don't this this argument holds water. |
I was referring to hot path of the SDK, comparing to calling once per service call to several times per service call.
This comparison was between reduce without spread vs reduce with spread. From the output of the benchmark shared, we'll have to use reduce without spread over fromEntries because of the following reasons:
|
Update: The PR posted on 3/2 at #2097 removes reduce and updates object keys. |
The Pull Request for modular version of DynamoDB Document Client is ready for review at #2097. There have been extensive design discussions and proof of concepts shared for Document client implementation over the last month. We expect to merge this code and release Document Client with Tagging active commenters from this issue for notifying: @kyeotic @cadam11 @mdesousa @stang-tgs @fboudra @mattiLeBlanc @rati-dzidziguri @mobsense @russell-dot-js @mhart @antstanley |
@trivikr, after looking into the PR itself, I can only say it is clean, easy to understand, and follows the overall vision of AWS SDK V3. So thank you and the team for all this work, and I hope to see this out soon. |
@trivikr - awesome that this solution is in place, and elegantly so. That is, having the code programmatically generated and this DocumentClient being even better than the first (better native types support, with Set, BigInt, etc). The whole process of this back and forth and you and the Amazon team being receptive has been great and proves that Open Source just helps makes things better, faster. Cheers! |
@trivikr the road has been long, a lot of bumps, but at the end, it reached expectations. |
Thank you, this helped A LOT. Here's an example
|
BTW how would you guys suggest validating incoming data before storing/updating it in the DB? |
I've ended up with creating a lib for the workaround, |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
Is your feature request related to a problem? Please describe.
Migrate existing code using documentClient to v3
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html
I'm also using dynamodb-toolbox on some projects (it relies on documentClient ):
https://github.com/jeremydaly/dynamodb-toolbox
Describe the solution you'd like
ability to use the convenient documentClient API with v3
Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: