Experiment with bogus-error approach for no-overhead bound-check-like behavior

The simdjson library is highly optimized. Through clever optimizations, it avoids most bound checks.

There are a few limitations. For example, we require a few bytes of padding at the end of the input (https://github.com/simdjson/simdjson/issues/174). We also refuse to parse a single JSON document that exceeds 4 GB (https://github.com/simdjson/simdjson/issues/128).

To get around this, we have an outstanding PR https://github.com/simdjson/simdjson/pull/1665 which undoes these clever optimizations, adds regular bound checking, and lower the performance somewhat, but also allows you to lift the padding requirement.

A more daring approach would not to not go back to conventional bound checking and, instead, push forward with our clever bound-free approach. Instead of doing all of these bound checks all over the place... examine the document when we get started, adjust the structural index so that at a strategic location you get a bogus error. This bogus error brings you into a distinct mode where you finish the processing with more careful code. Then you'd get the no-padding for free (given a large enough input).

This "bogus error" approach is also how I would try to handle the "stage 1 in chunks". You give me a 6 GB JSON document. I index it in chunks of 1 MB. I change the index so that somewhere before the end of the chunk, I encounter a bogus error. Then I know to load a new index.

This would be a bit challenging, for sure. And it would require that we maintain a slow path with bound checking at times. The latter could be achieved with templates, maybe.

cc @jkeiser 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment with bogus-error approach for no-overhead bound-check-like behavior #1686

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment with bogus-error approach for no-overhead bound-check-like behavior #1686

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions