EXAMPLE

🔢 nmr: name all canonical things 🔢

Give each canonical thing a unique English name using non-repeating words from a short list of common, short English words... or use a word list of your choice.

The word list is here: https://github.com/rec/nmr/blob/main/words.txt

Canonical things include but are not limited to:

numbers
fractions
times, dates, time intervals
lat/long positions on the earth
IP addresses
UUIDs

Other unimplemented possibilities:

Phone numbers
Periodic table
Chemical compounds(?) (probably not worth the huge effort)
Pieces of music! (my original motivating example)

Shorter or simpler things should generally be represented by shorter names.

Extending this to your own domain is easy and you can either add it to this codebase or write your own code and easily plug in in.

Installs both a module named nmr and an executable called nmr.py

EXAMPLE

import nmr

assert nmr(0) == ['the']
assert nmr(2718281828) == ['the', 'race', 'tie', 'hook']

for i in range(-2, 3):
    print(i, ':', *nmr(i))

# Prints
#   -2 : to
#   -1 : of
#   0 : the
#   1 : and
#   2 : a

HOW IT WORKS

All of this involves the same trick over and over again - counting using positional notation: https://en.wikipedia.org/wiki/Positional_notation

By default nmr has 1628 different words, which you can think of as 1628 different digits.

There's a wrinkle that makes things harder: the author decided that repeating words was unaesthetic, so counting is a little trickier, but you don't have to understand how it's done.

With repeating words eliminated, 1628 is the minimum total number of words needed to be able to represent all 64-bit integers with at most six words.

Conversion runs as follows.

First you take your type, whatever it is, and figure out a way to represent it as a number in positional notation, where perhaps the digits have different values.

Suppose you're wanting to represent the time of the day, down to milliseconds.

You represent it as a four digit number: hours, minutes, seconds, milliseconds

Next, you take the positional number, and evaluate it into a "type-relative number", a number that makes sense only within this type.

For example, to represent the time 15:11:55.823, we'd evaluate the number ((((15 * 60) + 11) * 60 + 55) * 1000 + 823) to get the type relative number, but this number might mean something completely different for say, lat-long.

Now, each type also gets its own fixed "type identifier number" and that gets combined with the type-relative number to give a new number, the nmr number which is unique over all types.

Then that nmr number is encoded into a non-repeating sequence of words from the word list, which is the nmr name.

In the reverse direction, a nmr name is given, and it's converted back into an nmr number, then split into the type identifier number and type-relative number.

The type identifier number is used to look up which type is being decoded, and then the type-relative number is decoded into the positional number, and finally back into the original type.

You are guaranteed that for each original canonical thing, T, there is a unique canonical name, and that decoding that name will always reproduce exactly the same thing.

It's mathematically an injection.

On the other hand, if you start with a random name and try to decode, a data type isn't required to give any guarantees at all.

You aren't even guaranteed that the decoding will return anything - some names just won't correspond even to a known type. If a name can be decoded to a known type, that type decoder might still not be able to come up with a type. Or, if it comes up with a type, that type might have a different canonical name.

This is all good because it makes it really easy to whip up an encoder/decoder, but for the basic encoders included with the program, we can offer a better guarantee: that we can always decode any any type-relative number and return an instance of the type, even if we weren't given a canonical name.

Separately, we add a "type wrapping" feature to the main program so that unknown type identifier numbers just wrap repeated over known type identifiers - again, this will never be a canonical name.

With those features, it means you can type random names into the program, and see what sort of thing you get.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
nmr		nmr
scripts		scripts
test		test
.envrc		.envrc
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELIST		CHANGELIST
FUNDING.yml		FUNDING.yml
LICENSE		LICENSE
README.md		README.md
google-10000-english.txt		google-10000-english.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EXAMPLE

HOW IT WORKS

About

Releases 1

Sponsor this project

Packages

Languages

License

rec/nmr

Folders and files

Latest commit

History

Repository files navigation

EXAMPLE

HOW IT WORKS

About

Resources

License

Stars

Watchers

Forks

Releases 1

Sponsor this project

Packages 0

Languages

Packages