ZTML

Extreme inline text compression for HTML / JS

By Eyal Gruss

Partially made at Stochastic Labs

On-chain media storage can require efficient compression for text embedded inline in HTML / JS. ZTML is a custom pipeline that generates stand-alone HTML or JS files which embed competitively compressed self-extracting text, with file sizes of 25% - 40% the original. These file sizes include the decoder code which is 1.5 - 2 kB (including auxiliary indices and tables). The approach makes sense and is optimized for small texts, but performs quite well also on large texts. The pipeline includes efficient alternatives to base64 which are also useful for inline images.

	File format	War and Peace (en)	Micromegas (en)
Project Gutenberg plain text utf8	txt	3.2 MB	63.7 kB
7-Zip 22.01 9 Ultra PPMd (excluding decoder)	7z	746 kB (23%)	20.8 kB (32%)
7-Zip 22.01 9 Ultra PPMd (self extracting)	exe	958 kB (29%)	232 kB (364%)
Roadroller 2.1.0 -O2	js	1.0 MB (30%)	26.5 kB (42%)
ZTML Base125 (keep whitespace and punctuation)	html (utf8)	910 kB (28%) using `mtf=80`	26.2 kB (41%) using `mtf=0`
ZTML crEnc (keep whitespace and punctuation)	html (cp1252)	813 kB (25%) using `mtf=80`	23.4 kB (37%) using `mtf=0`

Usage

A standard simplified pipeline can be run by calling ztml() or running python ztml.py from the command line. See ztml.py.

crEnc gives better compression but requires setting the HTML or JS charset to cp1252. Base125 is the second best option if one must stick with utf8.

See example.py for a complete example reproducing the above benchmark.

Caveats:

Files larger than a few MB might not work on iOS Safari or macOS Safari 15.
This solution favors compression rate over compression and decompression times. Use mtf=None for faster decompression of large files.
For compressing word lists (sorted lexicographically), solutions as Roadroller do a much better job.

ZTML pipeline breakdown:

Text normalization (irreversible; reduce whitespace, substitute unicode punctuation)
Text condensation (reversible; lowercase with automatic capitalization*, substitute common strings as: the, qu)
Burrows–Wheeler + Move-to-front transforms on text with some optional variants, including some new ones (beneficial for large texts)
Huffman encoding (with a codebook-free decoder, beneficial even as followed by DEFLATE)
Burrows–Wheeler transform on bits (beneficial for large texts)
PNG / DEFLATE compression (allowing native decompression, aspect ratio optimized for minimal padding, Zopfli optimization)
Binary to text encoding embedded in JS template literals:
1. crEnc encoding (a yEnc variant with 1.6% overhead, to be used with single-byte charset)
2. Base125 encoding (a Base122 variant with 15% overhead, to be used with utf8 charset)
Uglification of the generated JS (substitute recurring element, attribute and function names with short aliases)

*Automatic capitalization recovery is currently only partial. Use caps=raw or caps_fallback=True to preserve original capitalization.

Projects using this:

fragium

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
ztml		ztml
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
example.py		example.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZTML

Extreme inline text compression for HTML / JS

By Eyal Gruss

Partially made at Stochastic Labs

Usage

Caveats:

ZTML pipeline breakdown:

Projects using this:

About

Releases

Sponsor this project

Packages

Languages

License

eyaler/ztml

Folders and files

Latest commit

History

Repository files navigation

ZTML

Extreme inline text compression for HTML / JS

By Eyal Gruss

Partially made at Stochastic Labs

Usage

Caveats:

ZTML pipeline breakdown:

Projects using this:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages