New Base 60
A base 60 numbering system using only ASCII numbers and letters.
or
a side effect of building a personal URL shortener
Tantek Çelik, 2009-167
Introduction
I needed a way to compress numbers for a URL shortener I am building so I looked at existing work, decided I could do better with a better design methodology, and ended up deriving a base 60 numbering system from ASCII characters.
URL shortening
URL shorteners have grown quite popular in recent months, due mostly to Twitter and other microblogging services (Identica, FriendFeed) limiting posts to 140 characters.
However, 3rd party URL shortening services inevitably make links more fragile and vulnerable to a single point of failure due to maintenance or
being hacked. The only work around for this problem is for sites to host their own URL shorteners, as
http://flickr.com has done with
http://flic.kr/ URLs.
Typical URL shorteners take an arbitrary URL and return a short URL, usually on a shorter domain, followed by a seemingly random set of alphanumeric digits that usually indicate an id of an entry in a database which has the original URL.
Thus to develop my own URL shortener I realized first I needed to come up with a good way to maximally compress a number (or a few) into characters in a URL.
Update: see Whistle - my algorithmically reversible shortener that uses NewBase60.
While many existing shorteners use numbers and letters, I found a few flaws and vulnerabilities in their choice of what set of numbers and letters to use and decided to use a methodology based on usability and robustness to deliberately choose which characters to use.
Methodology
- ASCII.
- Use stricly ASCII7 characters - this provides maximum robustness for transmission of content to mobile devices, applications etc.
- Use ASCII ordering. If a character is latter in the ASCII order, it should designate a higher value.
- Identifier compatibility.
- HTML ID and NAME tokens: [A-Za-z], [0-9], "-", "_", ":", "." (so that base 60 numbers can be used, perhaps with a letter prefix, in HTML ID attributes / URL fragment identifers).
- identifiers in computer languages e.g. early C, C++: ASCII letters, digits, and underscores.
- Unambiguous readability. Skip latter characters that are often rendered the same or nearly the same in common small fonts (e.g. in monospace text editors or mobile phone displays) as earlier characters. Enable roundtripping from visual media (i.e. from print output - see ShortURLPrintExample - to manual OCR or human input).
- hexadecimal reuse. Reuse symbols used from standard hexadecimal numerical notation.
Derivation
- Starting with hex compatibility, values 0-15 are represented by
- continuing with available identifier characters, values 16-17 are represented by
- per unambiguous human readability:
- uppercase letter I is skipped as it is too similar to the number one.
- continuing with available identifier characters, values 18-22 are represented by
- per unambiguous human readability:
- uppercase letter O is skipped as it is too similar to the number zero.
- continuing with available identifier characters, values 23-33 are represented by
- continuing with available identifier characters, value 34 is represented by
- continuing with available identifier characters, values 35-45 are represented by
- per unambiguous human readability:
- lowercase letter l is skipped as it is too similar to the number one.
- continuing with available identifier characters, values 46-59 are represented by
New Base 60 numbering system
The complete numerical sexagesimal sequence grouped:
The complete sequence written out in rows of 10 for readability:
0 1 2 3 4 5 6 7 8 9
0 0 1 2 3 4 5 6 7 8 9
10 A B C D E F G H J K
20 L M N P Q R S T U V
30 W X Y Z _ a b c d e
40 f g h i j k m n o p
50 q r s t u v w x y z
Open Source implementation
Cassis V0 code
runs in both PHP and javascript
FOR THE LATEST NewBase60 Cassis.js CODE SEE:
NOTE: THE BELOW CODE IS AN ORIGINAL PROOF OF CONCEPT AND HERE FOR HISTORICAL PURPOSES ONLY:
function num_to_sxg($n) {
$s = "";
$m = "0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz";
if ($n===undefined || $n===0) { return 0; }
while ($n>0) {
$d = $n % 60;
$s = strcat($m[$d],$s);
$n = ($n-$d)/60;
}
return $s;
}
function num_to_sxgf($n, $f) {
$s = num_to_sxg($n);
if ($f===undefined) {
$f=1;
}
$f -= strlen($s);
while ($f > 0) {
$s = strcat("0",$s);
--$f;
}
return $s;
}
function sxg_to_num($s) {
$n = 0;
$j = strlen($s);
for ($i=0;$i<$j;$i++) { // iterate from first to last char of $s
$c = ord($s[$i]); // put current ASCII of char into $c
if ($c>=48 && $c<=57) { $c=$c-48; }
else if ($c>=65 && $c<=72) { $c-=55; }
else if ($c==73 || $c==108) { $c=1; } // typo capital I, lowercase l to 1
else if ($c>=74 && $c<=78) { $c-=56; }
else if ($c==79) { $c=0; } // error correct typo capital O to 0
else if ($c>=80 && $c<=90) { $c-=57; }
else if ($c==95) { $c=34; } // underscore
else if ($c>=97 && $c<=107) { $c-=62; }
else if ($c>=109 && $c<=122) { $c-=63; }
else { $c = 0; } // treat all other noise as 0
$n = 60*$n + $c;
}
return $n;
}
Additional Implementations
Ordered by date of first publication, earliest first.
Java and partial Python
On 2010-019 Faruk Akgul published his Java and partial Python implementations of NewBase60:
POSIX Shell
On 2010-078 Stephen Paul Weber published his POSIX Shell implementation of NewBase60:
Perl
On 2010-148 Steve Ivy published his Perl implementation of NewBase60:
Ruby
On 2010-152 Shane Becker published his Ruby implementation of NewBase60:
CommonJS and Nodejs aware Javascript
On 2010-224 Edward O'Connor (hober) published his CommonJS- and Node.js-aware Javascript version of NewBase60:
Fortran
Python
On 2015-027 Kevin Marks published a CC0 Python version of NewBase60 (based on Faruk Akgul's earlier partial Python version) that decodes as well as encoding:
Rust
On 2021-085 astralbijection published a GPL 3.0 Rust version of NewBase60:
References
See also
Related
Trivia
Return to MyNextStartup \ ProjectsList \ FrontPage.
Comments (0)
You don't have permission to comment on this page.