log inhelp

 | 
View
 

NewBase60

Page history last edited by Tantek 3 years, 8 months ago Saved with comment

New Base 60

 
A base 60 numbering system using only ASCII numbers and letters.
 or
a side effect of building a personal URL shortener
 
Tantek Çelik, 2009-167
shortlink: http://ttk.me/w/NewBase60 (was tr.im/base60)
 

Introduction

 
I needed a way to compress numbers for a URL shortener I am building so I looked at existing work, decided I could do better with a better design methodology, and ended up deriving a base 60 numbering system from ASCII characters.
 

URL shortening

 
URL shorteners have grown quite popular in recent months, due mostly to Twitter and other microblogging services (Identica, FriendFeed) limiting posts to 140 characters.
 
However, 3rd party URL shortening services inevitably make links more fragile and vulnerable to a single point of failure due to maintenance or being hacked. The only work around for this problem is for sites to host their own URL shorteners, as http://flickr.com has done with http://flic.kr/ URLs.
 
Typical URL shorteners take an arbitrary URL and return a short URL, usually on a shorter domain, followed by a seemingly random set of alphanumeric digits that usually indicate an id of an entry in a database which has the original URL.
 
Thus to develop my own URL shortener I realized first I needed to come up with a good way to maximally compress a number (or a few) into characters in a URL.  Update: see Whistle - my algorithmically reversible shortener that uses NewBase60.
 
While many existing shorteners use numbers and letters, I found a few flaws and vulnerabilities in their choice of what set of numbers and letters to use and decided to use a methodology based on usability and robustness to deliberately choose which characters to use.  
 

Methodology

 
  • ASCII.
    • Use stricly ASCII7 characters - this provides maximum robustness for transmission of content to mobile devices, applications etc. 
    • Use ASCII ordering.  If a character is latter in the ASCII order, it should designate a higher value.
  • Identifier compatibility.
  • Unambiguous readability. Skip latter characters that are often rendered the same or nearly the same in common small fonts (e.g. in monospace text editors or mobile phone displays) as earlier characters. Enable roundtripping from visual media (i.e. from print output - see ShortURLPrintExample - to manual OCR or human input).
  • hexadecimal reuse. Reuse symbols used from standard hexadecimal numerical notation.
 

Derivation

 
  • Starting with hex compatibility, values 0-15 are represented by
  • continuing with available identifier characters, values 16-17 are represented by 
    • following letters G-H
  • per unambiguous human readability: 
    • uppercase letter I is skipped as it is too similar to the number one.
  • continuing with available identifier characters, values 18-22 are represented by
    • J-N
  • per unambiguous human readability: 
    • uppercase letter O is skipped as it is too similar to the number zero.
  • continuing with available identifier characters, values 23-33 are represented by
    • uppercase letters P-Z
  • continuing with available identifier characters, value 34 is represented by
    • underscore "_"
  • continuing with available identifier characters, values 35-45 are represented by
    • lowercase letters a-k
  • per unambiguous human readability: 
    • lowercase letter l is skipped as it is too similar to the number one.
  • continuing with available identifier characters, values 46-59 are represented by
    • lowercase letters m-z
 

New Base 60 numbering system

 
The complete numerical sexagesimal sequence grouped:
  • [0-9A-HJ-NP-Z_a-km-z]
 
The complete sequence written out in rows of 10 for readability:
 
   0 1 2 3 4 5 6 7 8 9
 0 0 1 2 3 4 5 6 7 8 9 
10 A B C D E F G H J K
20 L M N P Q R S T U V
30 W X Y Z _ a b c d e
40 f g h i j k m n o p
50 q r s t u v w x y z

Open Source implementation

The following CASSIS V0 code (runs in PHP and javascript with the inclusion of cassis0.js and cassis0php.js from the CassisProject) is licensed under a Creative Commons Attribution Share-Alike license (for now). Please attribute to "Tantek Çelik" and link attribution to http://tantek.com
 

Cassis V0 code

runs in both PHP and javascript

FOR THE LATEST NewBase60 Cassis.js CODE SEE:

NOTE: THE BELOW CODE IS AN ORIGINAL PROOF OF CONCEPT AND HERE FOR HISTORICAL PURPOSES ONLY:

 

function num_to_sxg($n) {
 $s = "";
 $m = "0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz";
 if ($n===undefined || $n===0) { return 0; }
 while ($n>0) {
   $d = $n % 60;
   $s = strcat($m[$d],$s);
   $n = ($n-$d)/60;
 }
 return $s;
}

function num_to_sxgf($n, $f) {
 $s = num_to_sxg($n);
 if ($f===undefined) { 
   $f=1; 
 }
 $f -= strlen($s);
 while ($f > 0) { 
   $s = strcat("0",$s); 
   --$f; 
 }
 return $s;
}

function sxg_to_num($s) {
 $n = 0;
 $j = strlen($s);
 for ($i=0;$i<$j;$i++) { // iterate from first to last char of $s
   $c = ord($s[$i]); //  put current ASCII of char into $c  
   if ($c>=48 && $c<=57) { $c=$c-48; }
   else if ($c>=65 && $c<=72) { $c-=55; }
   else if ($c==73 || $c==108) { $c=1; } // typo capital I, lowercase l to 1
   else if ($c>=74 && $c<=78) { $c-=56; }
   else if ($c==79) { $c=0; } // error correct typo capital O to 0
   else if ($c>=80 && $c<=90) { $c-=57; }
   else if ($c==95) { $c=34; } // underscore
   else if ($c>=97 && $c<=107) { $c-=62; }
   else if ($c>=109 && $c<=122) { $c-=63; }
   else { $c = 0; } // treat all other noise as 0
   $n = 60*$n + $c;
 }
 return $n;
}

 

Additional Implementations

Ordered by date of first publication, earliest first.

Java and partial Python

On 2010-019 Faruk Akgul published his Java and partial Python implementations of NewBase60:

POSIX Shell

On 2010-078 Stephen Paul Weber published his POSIX Shell implementation of NewBase60:

Perl

On 2010-148 Steve Ivy published his Perl implementation of NewBase60:

Ruby

On 2010-152 Shane Becker published his Ruby implementation of NewBase60:

CommonJS and Nodejs aware Javascript

On 2010-224 Edward O'Connor (hober) published his CommonJS- and Node.js-aware Javascript version of NewBase60:

Fortran

Python

On 2015-027 Kevin Marks published a CC0 Python version of NewBase60 (based on Faruk Akgul's earlier partial Python version) that decodes as well as encoding:

Rust

On 2021-085 astralbijection published a GPL 3.0 Rust version of NewBase60:

References

 

See also

 

Related

 

Trivia


Return to MyNextStartup \ ProjectsList \ FrontPage.

Comments (0)

You don't have permission to comment on this page.