-
Notifications
You must be signed in to change notification settings - Fork 1
a static PPM-based URL compressor / decompressor
kazuho/url_compress
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
url_compress - a static PPM-based URL compressor / decompressor (c) 2008-2010 Cybozu, Labs. Inc. INTRODUCTION The library is a compressor / decompressor specifically designed to compress URLs. Expected compression ratio is from 25% to 40% of the original size. The compressed form does not preserve the original order however it is possible to perform a prefix search without decompressing the URLs. TO BUILD AND TEST Generate the compression tables. The following example uses "corpus/travail5" as prior knowledge and generates a 4095-entry compression table set (which will be around 2MB in size). % scripts/build_chreorder.pl < corpus/travail5 > src/chreorder.c % scripts/build_rctable.pl corpus/travail5 4095 1 1 > src/rctable.c Compile the test binary. % g++ -DRANGE_CODER_USE_SSE -g -Wall -O2 -Irangecoder src/url_coder.cpp src/test.c -o /tmp/url_compress_test Test compression / decompression (as well as the compression ratio) using the corpus. % /tmp/url_compress_test corpus/travail ADDING YOUR OWN CORPUS The compressor uses a static (predefined) compression table. So it is better to use your own list of URLs as the prior knowledge. Use the following command add your list of URLs to the corpus. % scripts/normalize_corpus.pl < your_own_url_list > corpus/your_urls Once added, follow the above section: TO BUILD AND TEST. USING THE LIBRARY FROM MYSQL Compile the UDF and copy it into lib/mysql/plugin (or lib/plugin) directory of your MySQL installation. (NOTE: on OSX use -bundle instead of -shared -fPIC) % g++ -DRANGE_CODER_USE_SSE -g -Wall -O2 -Irangecoder -shared -fPIC src/url_coder.cpp src/url_compress_udf.c -o url_compress_udf.so Then activate the UDFs using the "CREATE FUNCTION" command. mysql> CREATE FUNCTION my_url_compress RETURNS STRING SONAME 'url_compress_udf.so'; mysql> CREATE FUNCTION my_url_decompress RETURNS STRING SONAME 'url_compress_udf.so'; Check that the UDFs actually work. mysql> SELECT LENGTH(my_url_compress('http://www.google.com/')); +---------------------------------------------------+ | length(my_url_compress('http://www.google.com/')) | +---------------------------------------------------+ | 3 | +---------------------------------------------------+ 1 row in set (0.00 sec) mysql> select my_url_decompress(my_url_compress('http://www.google.com/')); +--------------------------------------------------------------+ | my_url_decompress(my_url_compress('http://www.google.com/')) | +--------------------------------------------------------------+ | http://www.google.com/ | +--------------------------------------------------------------+ 1 row in set (0.00 sec) THANKS TO @__travail__ and @fujiwara for providing the corpus.
About
a static PPM-based URL compressor / decompressor
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published