Skip to content

hvlck/txt

Repository files navigation

txt

text utilities (golang)

txt is mainly a lot of text normalization utilities, primarily for full-text search. This library is primarily intended for my own use, but pull requests are always welcome. The primary focus is on performance, for use in a full-text search engine I'm building.

Utilities

  • fast levenshtein distance calculator (~1.3*10^12 ops/s)
  • an implementation of the Porter stemming algorithm
  • stopword remover
  • various reading functions
    • read time
    • word/char count
    • (not implemented) text difficulty (Flescher-Kincaid)
  • a generic Trie data structure
    • inserts: ~6.7*10^14 ops/s
    • exact matches: ~3.3*10^15 ops/s
    • partial matches: ~1.2*10^14 ops/s

Roadmap

  • text normalisation
    • color names (words -> hex)
    • URL normalizations
    • remove fractional numbers to nth precision
    • synoyms/thesaurus
      • replace multiple words with one word that is a synoym of it
  • text difficulty
  • support for multiple languages

Credits

Releases

No releases published

Packages

No packages published

Languages