On-chain media storage may require efficient inline text compression for HTML / JS. Here is a custom pipeline to generate stand-alone HTML or JS files, embedded with self-extracting text, and file sizes of ~50% the original. The approach makes sense and is optimized for small texts, but performs well also on large texts.
War and Peace (en) | Micromegas (en) | |
---|---|---|
Project Gutenberg plain text utf8 | 3.2 MB | 63.7 kB |
ZTML (utf8 charset with Base125) | 1.6 MB (50%) | 35.0 kB (55%) |
ZTML (cp1252 charset with crEnc) | 1.4 MB (44%) | 31.3 kB (49%) |
ZTML pipeline:
- Text normalization (irreversible; reduce whitespace, substitute unicode punctuation)
- Text condensation (reversible; lowercase with automatic capitalization*, substitute common strings as: the, qu)
- Huffman encoding (with a codebook-free decoder, beneficial even as followed by DEFLATE)
- PNG / DEFLATE compression (allowing native decompression, aspect ratio optimized for minimal padding, Zopfli optimization)
- Binary to text encoding embedded in JS template literals:
- Uglification of the generated JS (substitute recurring element, attribute and function names with short aliases)
*Automatic capitalization recovery is currently partial.