ambiguous_words - generate sets of words Tesseract is likely to find ambiguous
ambiguous_words(1) runs Tesseract in a special mode, and for each word in word list, produces a set of words which Tesseract thinks might be ambiguous with it. TESSDATADIR must be set to the absolute path of a directory containing tessdata/lang.traineddata.