-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize JSON ops: minimize allocs by decoding v8::Value directly #9540
Comments
If successful this could simplify a lot of internal ops code (e.g: remove minimal ops, since they would be similar to the default) |
@AaronO It's an interesting idea but we are skeptical it can be made faster than the status quo. It will also be quite a lot of work just to be able to perf test it. If you decide to play around with it and get some results, we will be eager to see them, but this is not something the core team will pursue. |
@ry Completely understandable, I wasn't expecting anyone on the core team to put in time on it, especially without a POC. I simply shared the idea to get broad feedback and ensure that I wasn't missing some rationale backing the JSON approach. I hacked together a very quick POC (modifying Here are the benchmarks ran locally on my M1 Air: Master
Optimized Hack
|
@AaronO Do you have some source code you can share? Would be interested to look at this. |
@lucacasonato The POC I just put together is extremely hackish and not worth sharing (goal was to test the perf hypothesis), but I have a relatively clear idea on how to implement I might take a first stab at it this weekend. |
A quick update to let you know I've started implementing I hope to share a first implementation this weekend, the benchmarks should closely model the perf improvements of changing the op-ABI. Once/if the perf improvements are demonstrated, I'll open a PR implementing my suggested op-ABI. (Note: We could in theory design the op-ABI to be encoding independent, so JSON/v8 arg-encodings could be easily switched with minimal changes) Here's a quick example running locally to give you a preview: Output
Example codeuse rusty_v8 as v8;
use serde_v8;
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct MathOp {
pub a: u64,
pub b: u64,
pub operator: Option<String>,
}
fn main() {
let platform = v8::new_default_platform().unwrap();
v8::V8::initialize_platform(platform);
v8::V8::initialize();
{
let isolate = &mut v8::Isolate::new(v8::CreateParams::default());
let handle_scope = &mut v8::HandleScope::new(isolate);
let context = v8::Context::new(handle_scope);
let scope = &mut v8::ContextScope::new(handle_scope, context);
fn exec<'s>(scope: &mut v8::HandleScope<'s>, src: &str) -> v8::Local<'s, v8::Value> {
let code = v8::String::new(scope, src).unwrap();
let script = v8::Script::compile(scope, code, None).unwrap();
script.run(scope).unwrap()
}
let v = exec(scope, "32");
let x32: u64 = serde_v8::from_v8(scope, v).unwrap();
println!("x32 = {}", x32);
let v = exec(scope, "({a: 1, b: 3, c: 'ignored'})");
let mop: MathOp = serde_v8::from_v8(scope, v).unwrap();
println!("mop = {:?}", mop);
let v = exec(scope, "[1,2,3,4,5]");
let arr: Vec<u64> = serde_v8::from_v8(scope, v).unwrap();
println!("arr = {:?}", arr);
let v = exec(scope, "['hello', 'world']");
let hi: Vec<String> = serde_v8::from_v8(scope, v).unwrap();
println!("hi = {:?}", hi);
let v: v8::Local<v8::Value> = v8::Number::new(scope, 12345.0).into();
let x: f64 = serde_v8::from_v8(scope, v).unwrap();
println!("x = {}", x);
}
unsafe {
v8::V8::dispose();
}
v8::V8::shutdown_platform();
} |
TLDR: Early benchmarks on a new ABI (with Here's a first iteration of benchmarks comparing simplified versions (only sync ops, ...) of Deno's current op-ABI and a new one using
(Note: each Op functionsI implemented 3 basics ops called by each ABI, to represent a subset of payload types. pub fn sum(args: Vec<u64>) -> u64 {
args.into_iter().sum()
}
#[derive(Deserialize)]
pub struct AddArgs {
a: u32,
b: u32,
}
pub fn add(args: AddArgs) -> u32 {
args.a + args.b
}
#[derive(Deserialize, Serialize)]
pub struct Person {
first_name: String,
last_name: String,
age: u8,
}
pub fn promote(args: Person) -> Person {
Person {
first_name: args.first_name.to_uppercase(),
last_name: args.last_name.to_ascii_uppercase(),
age: args.age + 1,
}
} Detailed decoding benchmarksHere are more granular benchmarks purely on decoding:
(Note: The main takeaway here is that, I plan to implement a I suspect with Note: it would be hard to optimize @lucacasonato I'll share the repo soon. It would great to have your review to ensure that these benchmarks are fair and representative of the improvement (I also haven't used rust a ton, so feedback on style & best-practices are welcome too) |
@lucacasonato Voilà, you can check out a first version here: https://github.com/AaronO/serde_v8 Main files of interest:
|
If the core team is open to this new ABI, I could probably submit a PR later this week or next weekend. |
A quick update to share a decent performance improvement. Going by the Key highlights (of benchmarks improved): Before:
After:
The main improvement is in New benches
|
Full benchesThese benches are now fully representative of op-encoding overhead in the API, it measures decoding inputs and encoding outputs/returns.
Note: ConclusionsThe |
TLDR: implement a
serde
deserializer (& serializer) forv8::Value
to minimize allocations/overhead of encoding to JSONProblem & context
Deno-ops are kind of like syscalls, all deno programs must use them to do I/O and access runtime features.
Besides the v8 runtime, it's the only part of the stack that's not "user-optimizable" (users can easily optimize their own JS/TS code and JS/TS libraries, but touching runtime code will often be a stretch for many users).
As such I think it's a good principle to minimize syscall/op overhead, so the runtime is never "the bottleneck".
The current JSON op-ABI has some avoidable overhead, specifically heap-allocations and utf8 encoding.
(Note: I think most real-world programs will likely be I/O bound, or user-code CPU bound before being bottlenecked by the current op-ABI, but it's a good principle nonetheless and helps on microbenchmarks ...)
Current implementation
The current op-ABI passes JSON args as following:
Steps
1.
,2.
and3.
make avoidable heap-allocations. CPU-time "wasted" on JSON parsing/stringificationAlternative implementation
v8::Value
) as isBasically pass JS objects untouched then decode using
serde_v8
(to be implemented)Benefits
serde_v8
serialization)Notes
decodeJson()
step (utf8 decode + json parse) by serializing directly to av8::Value
bin_ops
will probably still be faster than this improvedjson_ops
since using shared memory with no-decoding is hard to beat, but this should help close the gap whilst preserving benefits & convenience of structured argsFurther changes
Implementing this optimization would require either changing the op-ABI or implementing an alternative opt-in "fast-path" (e.g:
Deno.core.sendFast
orsendJSON
).Currently JSON ops are a special case of "buffer ops", I think this would be a good opportunity to "reverse" that and consolidate all ops under a common explicit signature (rather than having implicit signatures packed through buffers):
(
args
andResult
could be anyserde
/serde_v8
Serializable types)(
promise_id
,args
andbuffer
should probably all beOption<>
)Next steps
serde_v8
Questions
BufVec
)? Searching through the code the pattern seems to be 2 buffers max, first is promise+json, second buffer to read/write. If so we can remove BufVec from the new ABI (as suggested above)The text was updated successfully, but these errors were encountered: