-
-
Notifications
You must be signed in to change notification settings - Fork 46
Statoscope 5.25: Compressing stats with Binary JSON
We're excited to announce the release of Statoscope 5.25! This release introduces a significant efficiency improvement, reducing the size of the comparison report between two builds (master vs PR) by a factor of 45x. In our real-world project, this means a reduction in size from 494MB to 11MB.
To achieve this improvement, we've implemented JSON binary encoding for stats data, which uses various techniques to reduce the number of bytes needed to encode data structures. These techniques include value deduplication and compact representation of strings and arrays. The result is a much smaller file size compared to standard JSON. This approach is particularly effective for large datasets over 10MB.
You may be wondering why we didn't just use compression like gzip. When JSON is compressed using gzip, it is treated as a general text and compressed using general methods. However, by using a method designed specifically for the JSON structure, which takes into account its specific nature, the resulting size might be much smaller. While the compression ratio depends on the structure of input data, we've seen significant improvement with webpack's stats.
Statoscope uses an experimental implementation of binary encoding by Roman Dvornov. The screenshot below shows a comparison of JSON binary encoding solutions, with the solution used in Statoscope marked as json-ext with a date as work is still in progress:
The comparison was conducted on normalized (by Statoscope) stats. In addition to reducing file size, the binary encoding also improves decoding time (see "Decode" column), especially when gzip compression is involved (see "Gzip" and "Gunzip" columns). We've seen 3-6 times faster data restoring (gunzip and decode, see "Gun+Dec" column) compared to other solutions, including standard JSON.
To generate a small HTML report, webpack's stats go through several transformation stages, including normalization, binary encoding, gzip compression, splitting into chunks, encoding the chunks in base64, and writing them to a file. All of this reduces the size of the report from 1.5GB to just 5MB (despite the overhead of base64 encoding, which adds ~33% to the size of encoded data). A Statoscope's normalization of stats can reduce the size of stats from 1.5GB to 250MB. By using binary encoding and gzip compression, we can achieve a reduction in report size from 250MB to 5MB (or from 500MB to 11MB for a difference report).