OSS OCR ã® Tesseract ãã¹ã´ã¤ä»¶
æ´æ¸ã®è¼ªè¬ã§ãéããæ¬ãæã¡éã³ãããªãã®ã¨ãè¾æ¸å¼ããå¹çåããããã«Tesseractã§OCRåãã¦ã¿ãã
以åã¯sourceforge.netã§ãã¹ãããã¦ãããããã¤ã®éã«ã Google Code ã«ç§»ã£ã¦ããã
詳ããã¯ä»¥ä¸ãåç
§ã
Windows ãªäººã¯
- tesseract-2.xx.exe.tar.gz
- tesseract-2.00.eng.tar.gz
ããã¦ã³ãã¼ããã¦ããã
tesseract.exe
tessdata/eng.*
ã¨ãããã£ã¬ã¯ããªæ§é ãä½ãã
è¦éã/段çµããã¾ããã¨å¦çããæ¹æ³ã¯ãªãã£ã½ãï¼æªç¢ºèªï¼ãªã®ã§ãå ã«æä½æ¥ã§ãã¡ã¤ã«ãåå²ããã
å§ç¸®ãããtiffãæ±ããããã«ããã®ã¯é¢åãªã®ã§ãæå ã®ãã¡ã¤ã«ãéå§ç¸®å½¢å¼ã®tiffã«å¤æããã
libtiff ã®binary(http://gnuwin32.sourceforge.net/packages/tiff.htm)ããã¨ãã¦ãã¦
ããã¨ç¡å§ç¸®ã®tiffãã¡ã¤ã«ãå¾ãããã
tesseract.exe src.tiff dst -l eng
ã¾ã 1ãã¼ã¸ä½¿ã£ã¦ã¿ãã ãã©ãæç´ãããã¨ãã*1ã¯åè§ã¹ãã¼ã¹ã1ã¤å
¥ããã ãã§æ¸ãã ãç¥ãã¼ã«ï¼
ocropusãæ°ã«ãªããã©è©¦ãã¦ããªãã
*1:Wordã®ã¹ãã«ãã§ãã«èª¿ã¹