-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup time of Quarto #11916
Comments
Have you tried starting with --no-check? This disables type checking, which takes up the majority of startup time. |
Yes we have tried with |
I ran and got the following:
The bundle output is 3.73MB. The bundler is not "advanced" enough to do deep "tree shaking" like with namespace imports unlike some advanced bundlers which will try to determine what out of the namespace is used and elide that. Also, I notice there are four top level awaits in the output bundle. These effectively would be a blocking action to the execution and would appear to be counted as start-up time.
My opinion is the following:
|
Opened #11918 for dealing with loading web assembly. |
It so happens that I'm gleaning from this discussion that v8 snapshots might not in fact be helpful here? Is that correct, or could it be a combination of improving WASM loading and v8 snapshotting that brings the startup time closer to negligible? |
Hard to say for sure. We have to yet get ESM working with snapshots, and then layer on the complexity of TLA, there are technical challenges to support them as general purpose user code. My supposition is that it is TLA that is the critical path here, because it effectively blocks module evaluation until the top level promises resolve and would directly impact the user perceived startup time. Snapshots wouldn't make that go away (and might even be a barrier to using snapshots at all). |
If we could eliminate our usages of TLA (haven't looked closely at feasibility of this yet) would that unblock better parallelism for the WASM loading? (i.e. would it allow user code to execute immediately and WASM only be loaded on demand?). Apologize if the answer to this is already above I am not conversant enough w/ the architecture to follow all the implications of things. |
@jjallaire there are 4 instances of top-level-await in your code, none of them are coming directly from your code. One of them is bootstrapping The other 3 are all loading WASM, of which |
Here is our import map (https://github.com/quarto-dev/quarto-cli/blob/main/src/import_map.json): {
"imports": {
"async/": "https://deno.land/[email protected]/async/",
"fmt/": "https://deno.land/[email protected]/fmt/",
"flags/": "https://deno.land/[email protected]/flags/",
"log/": "https://deno.land/[email protected]/log/",
"path/": "https://deno.land/[email protected]/path/",
"fs/": "https://deno.land/[email protected]/fs/",
"hash/": "https://deno.land/[email protected]/hash/",
"io/": "https://deno.land/[email protected]/io/",
"encoding/": "https://deno.land/[email protected]/encoding/",
"uuid/": "https://deno.land/[email protected]/uuid/",
"testing/": "https://deno.land/[email protected]/testing/",
"http/": "https://deno.land/[email protected]/http/",
"signal/": "https://deno.land/[email protected]/signal/",
"ws/": "https://deno.land/[email protected]/ws/",
"textproto/": "https://deno.land/[email protected]/textproto/",
"datetime/": "https://deno.land/[email protected]/datetime/",
"xmlp/": "https://deno.land/x/[email protected]/",
"cliffy/": "https://deno.land/x/[email protected]/",
"lodash/": "https://deno.land/x/[email protected]/",
"fuse/": "https://deno.land/x/[email protected]/",
"deno_dom/": "https://deno.land/x/[email protected]/",
"port/": "https://deno.land/x/[email protected]/",
"puppeteer/": "https://deno.land/x/[email protected]/",
"media_types/": "https://deno.land/x/[email protected]/",
"observablehq/parser": "https://cdn.skypack.dev/@observablehq/[email protected]",
"acorn/walk": "https://cdn.skypack.dev/[email protected]",
"acorn/acorn": "https://cdn.skypack.dev/[email protected]"
}
} As you can see we are attempting to pin to 0.97.0 of the standard library across the board. However, it appears as if several of our dependencies are doing their own pinning, which results in the duplication. Is there any way around this or is this sort of duplication inevitable? In terms of So it seems like there would still be 700ms of startup time (on a very fast laptop) after we eliminate the WASM problem. It still feels like there is a significant problem to solve here -- we deal with a bunch of interpreted languages in our project (mostly Python and R) and in spite of the runtimes being extremely slow the interpreters have close to zero startup cost (so chaining a bunch of calls to them is mostly free). If Deno CLI applications have a non-trivial startup cost that is linear with bundle size (that's of course speculation at this point) I fear that it will rule out using Deno for CLI tools (expect for fairly small ones). Hopefully this could be resolved by using a v8 snapshot of the bundled JS file -- if you think there is some chance of that working we could try using |
Attempting to determine the source of the additional Update: I discovered the source of v0.91.0/hash by experimentation. Still interested in whether there is a more automated approach and whether or not multiple versions of standard libraries are inevitable if libraries pin their |
Further update, the following three remediations reduce startup time from 1.2s to 480ms:
Bundle size is now 2.2mb and startup time is 480ms. This is obviously a huge improvement but possibly still of significant concern for CLIs that need to be interactive speed and especially if startup time grows linearly with bundle size. |
@jjallaire you can use |
@balupton Wow fantastic! Thank you :-) |
Not sure why I was |
Regarding the overriding of other imports, yes, it can be done easily. If you are convinced globally that the version of the import can be pinned to another version, you would do something like this in the import-map: {
"imports": {
"https://deno.land/[email protected]/": "https://deno.land/[email protected]/"
}
} If you find that for some reason, another dependency needed a specific version of std, then you can use the As @bartlomieju mentioned, the way I knew all the dependencies, which I mentioned in my first comment, was using
|
Note that @kitsonk Thank for the pointer on overriding imports. We will try that next. We are currently at 330ms to run 2.8mb of bundled JS (750k of which is the deno_dom WASM) that executes no user code (instrumented to |
@jjallaire is that with the |
Yes that is with I just did a quick experiment to change our Since that is way more proportional increase in execution time than the increase in code size, it seems like something in the execution of the JS bundle is occupying that time. Is there a straightforward way to profile this? |
Note that the WASM theory has definitely proved out to be correct as we have done from 1230ms to 330ms by eliminating WASM loading (+ eliminating 900k of puppeteer that was only needed for tests). The remaining 330ms does seem puzzling though as I wouldn't expect parsing & executing a 2.8mb JS bundle to take nearly that long. |
More findings:
So it seems like code size has nothing to do with the problem and there is some non-trivial execution time being spent when loading the JS bundle (which is ~ 50% worse in v0.105.0 vs. 0.97.0) |
You should be able to repro this w/ the following: git clone https://github.com/quarto-dev/quarto-cli.git
cd quarto-cli
./configure-macos.sh # or .configure-linux.sh
cd package/src
./quarto-bld prepare-dist
time ../dist/bin/quarto Invoking with no args is effectively a no-op (I've confirmed that once user code gets control total execution time is 6ms). Bundled source code is at ../dist/bin/quarto.js
|
Attempting to profile, first using the Chrome profiler which didn't work then using the Anyway both the raw log file and the processed version are attached. Not sure whether they include anything useful. |
There were some clues in the profile data that acorn parser initialization might be in play. I speculatively removed all the code depending on acorn + the dependency and that was another 80ms. It seems likely that the other parts of the 330ms will be of a similar nature (or things completely unavoidable such as I'm happy to close this issue now assuming you don't want to dig further into what might be going on based on the profiler data. It does seem like if we are really spending 330ms reading the JS file and doing various expensive initializations therein (e.g. acorn data structures) that v8 snapshotting would indeed be of some help -- does that make sense? |
@jjallaire is this problem still ocurring with latest v1.17.1? |
Yes, we do still see the same overhead for loading the JS in v1.17.1. We have a ~ 3mb JS bundle and it takes about 420ms to execute a "do nothing" command where the time that our typescript has control is < 10ms. |
Is this still an issue? FYI I'm tracking startup time optimizations in #15945 |
The improvements in JS parsing in Deno 1.25 took us from ~ 900ms of startup time to ~ 150ms of startup time. This is acceptable but we'd certainly love to see an improvement on the 150ms that are remaining! |
@jjallaire could you give any update on this issue? We've done multiple improvements to the startup time in the past two months so I'm wondering if this issue is still relevant. |
@bartlomieju Thanks for following up! We are just completing our v1.3 release which is pinned to v1.28.2 from November. As we begin v1.4 development we'll go to the latest Deno and report back here on improvements we observe in pure overhead during startup. cc @cscheid |
Hi @bartlomieju we are now on Deno v1.33.1 so I have some metrics to share. With Deno v1.28.2 our "cold start" time for a do-nothing invocation was ~ 145ms. With Deno v1.33.1 it is ~ 110ms which is a very nice improvement indeed! |
@bartlomieju unfortunately I'm observing some regression in cold startup performance in builds created with Here's some stats using the fastest result of three consecutive invocations of
Note that between collie 0.12 and 0.15.3 we did also change a bunch of code internal to the tool and upgraded cliffy, the cli framework that we use. So I can't quite rule out that's partly to blame for the first drop, but between deno 1.25 and 1.34 we are using identical code here. |
I'm going to close this one because it's old, seems resolved, and is quite broad. Please open specific Note that in the past year there has been additional significant improvements. Also, in Deno 2.1 (being released tomorrow) startup time on Windows has improved and |
Writing to see if there is something we can do about startup time for Deno CLI applications. We have a moderately sized CLI application (25k lines of TypeScript code) that nevertheless takes 1.2s to startup and print its help (i.e. not executing any of its main codepaths). This compared to
deno help
taking 0.01s on the same system (a 2019 MB Pro w/ 2.4GHz i9).Our application is Quarto (https://quarto.org) which is a Pandoc-based static site generator focused on scientific and technical writing.
We have a few concerns about startup time given that quarto commands are intended to be run at "interactive speed":
I'd imagine that on less capable systems this time time could be upwards of 2 or 3 seconds;
As our LOC grows (likely by 2 or 3x) it seems like the latency will grow linearly (as we've noted that a simple 'hello.js' script can be run by deno at approxmiately the speed of
deno help
);Our tool (and I'd surmise many other developer oriented CLIs) are often called from IDEs (e.g. in save hooks) where latency can be a problem.
CLI tool invocations are often stacked together (e.g. think of bash scripts that orchestrate git workflows) so the per-invocation overhead can really, really add up.
As as aside, I think deno has huge potential for creation of dependency-free developer-oriented CLIs. The only other viable options are Rust (which is not as broadly accessible as JS) and Go (which must be learned afresh for many devs and ultimately won't be able to keep up w/ the JS library ecosystem). I don't know if this is a major priority for the deno team (as server applications are likely to predominate over CLI applications). In the case that it is I think this is an issue that would be a show-stopper for many applications.
I can think of a few places to go from here:
We are somehow doing something wrong (e.g. not availing ourselves of tree shaking or otherwise bundling in a non-optimal fashion). We currently just use
deno bundle --unstable --import-map <xxx> src/quarto.ts
(wherequarto.ts
is our main entry point).A future deno feature will address this. My understanding is that deno overcame its own startup latency issues by using v8 snapshots. It looks there has been discussion of loading custom shapshots here (Discussion: export and load custom snapshot #1877) but there are issues with ES module loading that prevent this. There has also been discussion of creating v8 snapshots during compile (can deno compile into bytecode? #8820). Is there any prospect of either of these features being implemented in the near term?
To overcome this we should be building a custom rust binary that uses
deno_core
and our own snapshot (as discussed here: Allow loading non-static snapshots #4402). If this is the only practical solution available in the near term any pointers you have on examples or documentation related to this would be very, very much appreciated.We have been incredibly happy with Deno as a development platform and want to be all-in on using it for the lifetime of our project. Given this, we'd be more than happy to fund work on any of the above items if that is helpful.
The text was updated successfully, but these errors were encountered: