Skip to content
/ ztml Public

Extreme inline text compression for HTML / JS. A custom pipeline that generates stand-alone HTML or JS files which embed competitively compressed self-extracting text, with file sizes of 25% - 40% the original.

License

Notifications You must be signed in to change notification settings

eyaler/ztml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZTML

Extreme inline text compression for HTML / JS

On-chain media storage may require efficient inline text compression for HTML / JS. Here is a custom pipeline to generate stand-alone HTML or JS files, embedding self-extracting text, and having file sizes of 30% - 40% the original. These file sizes include the decoder code which is less than 1.5 kB. The approach makes sense and is optimized for small texts, but performs quite well also on large texts.

File format War and Peace (en) Micromegas (en)
Project Gutenberg plain text utf8 txt 3.2 MB 63.7 kB
7-Zip 9 Ultra PPMd (excluding decoder) 7z 746 kB (23%) 20.8 kB (32%)
7-Zip 9 Ultra PPMd (self extracting) exe 958 kB (29%) 232 kB (364%)
ZTML (Base125 using utf8 charset) html 982 kB (30%) 29.2 kB (46%)
ZTML (crEnc using cp1252 charset) html 877 kB (27%) 26.1 kB (41%)

Usage

The standard simplified pipeline can be run by calling generate() or running python ztml.py from the command line. See ztml.py.

crEnc gives better compression but requires setting the HTML or JS charset to cp1252. Base125 is the second best option if one must stick with utf8.

See example.py for a complete example reproducing the above benchmark.

Note: files larger than a few MB might not work on iOS Safari or macOS Safary 15

ZTML pipeline:

  1. Text normalization (irreversible; reduce whitespace, substitute unicode punctuation)
  2. Text condensation (reversible; lowercase with automatic capitalization*, substitute common strings as: the, qu)
  3. Huffman encoding (with a codebook-free decoder, beneficial even as followed by DEFLATE)
  4. Burrows–Wheeler transform
  5. PNG / DEFLATE compression (allowing native decompression, aspect ratio optimized for minimal padding, Zopfli optimization)
  6. Binary to text encoding embedded in JS template literals:
    1. crEnc encoding (a yEnc variant with 1.6% overhead, to be used with single-byte charset)
    2. Base125 encoding (a Base122 variant with 15% overhead, to be used with utf8 charset)
  7. Uglification of the generated JS (substitute recurring element, attribute and function names with short aliases)

*Automatic capitalization recovery is currently partial.

About

Extreme inline text compression for HTML / JS. A custom pipeline that generates stand-alone HTML or JS files which embed competitively compressed self-extracting text, with file sizes of 25% - 40% the original.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published