-
-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
The skip_docs parameter is passed to the index_stream:
The index_stream method will skip lines:
But the dumpreader has already used time to load the json, even for skipped lines:
While working on a fix for this, IMO a good workaround is to tail the lines before sending them to the cli, e.g.:
pbzip2 -c -d -p8 latest-all.json.bz2 | tail -n +879322
(Notice here I am using a multi-threaded bzip2 implementation)
Also I suggest switching to orjson since it faster than the system json.
Metadata
Metadata
Assignees
Labels
No labels