A set of utilities to provide information about languages, by their <a href="https://en.wikipedia.org/wiki/ISO_639">ISO 639</a> code.
The ISOTools class provides tools for the normalisation of ISO 639 codes.
There are multiple different standards for language codes in software:
- <a href="https://en.wikipedia.org/wiki/ISO_639-3">ISO 639-3</a>: There are 6000+ ISO to language codes in this standard, developed in collaboration with SIL international. Provides a very granular classification of languages, however does not allow for
- <a href="https://en.wikipedia.org/wiki/ISO_639-2">ISO 639-2</a>:
- <a href="https://en.wikipedia.org/wiki/ISO_639-1">ISO 639-1</a>:
- <a href="https://en.wikipedia.org/wiki/Locale_(computer_software)">Local codes</a>: in format
[language[_territory][.codeset][@modifier]]` or `[language[-territory][.codeset][@modifier]]
.
When writing software such as LangLynx, I found myself needing to provide information about languages and dialects which weren't possible with any of these standards:
- ISO code: supports normalisation
- Script code:
- Variant code:
- Territory code:
I didn't add the codeset/character set, as I think it can be reasonable to expect the use of Unicode across all of my applications.
from iso_tools.ISOTools import ISOTools
print(ISOTools.verify_iso('en'))
# Remove specific properties
print(ISOTools.get_L_removed('en_Latn', [SCRIPT]))
print(ISOTools.removed('ja_Jpan', LANG | SCRIPT))
# Miscellaneous
print(ISOTools.get_lang_props('en'))
print(ISOTools.guess_omitted_info('en'))
# Locale->ISO
print(ISOTools.locale_to_iso('en-US'))
print(ISOTools.locale_to_iso('en-GB'))
# Remove implied/unecessary information (or recover it)
print(ISOTools.pack_iso('ja_Jpan'))
print(ISOTools.remove_unneeded_info('ja_Jpan'))
print(ISOTools.unpack_iso('ja'))
# Split/join ISO code information
print(ISOTools.join(part3='en', script='Latn'))
print(ISOTools.split('ja'))
print(ISOTools.split_multiple('ja_-_en'))
# Fileame escape/de-escape
print(ISOTools.filename_escape('en'))
print(ISOTools.filename_split('en'))
# URL escape/de-escape
print(ISOTools.url_join('ja', 'en'))
print(ISOTools.url_split('ja_-_en'))
print(ISOTools.url_escape('ja_Jpan'))
print(ISOTools.url_unescape('ja_Jpan'))
There is also access to the original ISO 639-3 language data indexed by three-character codes, allowing both conversions
from iso_tools.iso_codes.ISOCodes import ISOCodes
# Convert ISO 639-1/ISO 639-2 codes to ISO 639-3
ISOCodes.to_part3('en')
# Get information about a specific ISO code
ISOCodes.get_D_iso('eng')
# Get other names for an ISO code
ISOCodes.get_D_alternate_names('eng')
# ???
ISOCodes.get_D_iso_639('eng')
# ???
ISOCodes.get_D_lang_codes('eng')
# Get whether there are macro codes (ADD AN EXPLANATION/EXAMPLES!)
ISOCodes.get_L_macros('eng')
ISOCodes.get_L_rev_macros('eng')