å人éçºç¨åº¦ã®OCRãµã¼ããªãHerokuã«ç«ã¦ãã°ãããããªã
è¿½è¨ 2018/07/26
ãã®ããã¸ã§ã¯ãã¯Docker on Herokuã«ç§»è¡ããã®ã§ãä¸æçã«ãDeploy to Herokuããã¿ã³ã¯åããªããªã£ã¦ã¾ããDockerã§ãããã¤å¯è½ãªã®ã§ã以ä¸ããåèãã ããã
以ä¸åæ
ãã®ã¨ã³ããªã¯Go (その2) Advent Calendar 2016 - Qiitaã®5æ¥ç®ã§ãã
WETãªæ¹ã§ããä¸è©±ã«ãªã£ã¦ããã¾ããotiai10ã§ãã
ã¨ããå人éçºãããããããã3å¹´ãããç¶ãã¦ãããã§ããããã©ã¦ã¶ä¸ã«æç»ãããã¡ãã£ã¨ããããã¹ããOCRï¼æåèªèï¼ããè¦ä»¶ããã£ã¦ãããããã£ã¨ãããVPSã§Pythonã§ãã£ã¦ã¾ãããã3å¹´ããã£ã¦ãã¨ãä»ã®å人éçºã ã£ããã¨ããå¯æ¥ã ã£ããã¨ãã§ãµã¼ããªã½ã¼ã¹ãä¸ç·ã«ããããªããªããã£ã¦ãããã¨ãçºçãã¦ãã¦ããããã¹ã¤ã«åºåã¨ãã¢ãã£ã¨ãã§å ±é ¬ãçºçããã¿ã¤ãã®å人éçºã§ããªãã®ã§ããªãã§ä¿ºã身ãåã£ã¦ç¶æããªãããããã¨ããæ°æã¡ã«ããªã£ãã®ã§ãããããããç¡æã§OCRãµã¼ãç«ã¦ãã®ãããã¾ãããGoã§ãGo好ããªã®ã§ã
tl;dr
ããæ¼ãããããç«ã¡ã¾ãã
ç¹å¾´
- Google Cloud Vision API ã®ããã«èªç±ç»åããæå座æ¨ãæ¤åºãããããªã
- ä½ç½®ã¨ç©å½¢ã確å®ãã¦ããå ´åã®æåèªèã«é©ãã¦ã¾ã
- Google Cloud Vision API ã§ã¯æå®ã§ããªãchar_whitelistï¼æ¤åºæåå¶éï¼ãã§ãã
- æ³å®ãããæåãéããã¦ããå ´åã«é©ãã¦ãã¾ã
- ç¡æ â ã¤ãã
ã¿ãªãããã³ãã³èªåã®OCRãµã¼ãã¤ã³ã¹ã¿ã³ã¹ãç«ã¦ã¦ãã¬ã·ã¼ããèªããªããã¢ã¤ãã«ã®ã¹ã¿ããå復æéãèªããªããã¦ãã ããã
ãã£ããã¨ãã®1: Goã§C++ã®ã©ã¤ãã©ãªãå©ã
Goã¯cgoã£ã¦ããã®ããã£ã¦ãGoããC/C++ãC/C++ããGoãå¼ã¹ãããããã§ããã©ãããã使ã£ã¦æ¢åã®å å¦æåèªèã©ã¤ãã©ãªã§ããTesseract-OCRãå¼ãã§ã¾ãã
ä¾ãincludeãã¦ããtess.h
ã¯èªä½ã®ããããã¡ã¤ã«ã§ãcppãã¡ã¤ã«ãå®éã«Tesseract-OCRãå©ãã¦ã¾ãããã®Goãã¡ã¤ã«ããã«ãããã¨ãC/C++ã®ã©ã¤ãã©ãªã«ãªã³ã¯ãããå®è¡ãã¡ã¤ã«ãæã«å
¥ããã¨ãã寸æ³ã§ãã
package tesseract /* #cgo LDFLAGS: -llept -ltesseract #include "tess.h" */ import "C" // Simple executes tesseract only with source image file path. func Simple(imgPath string, whitelist string,languages string) string { p := C.CString(imgPath) w := C.CString(whitelist) l := C.CString(languages) s := C.simple(p, w,l) return C.GoString(s) }
ãã£ããã¨ãã®2: ãã£ãããªã®ã§WAFã£ã½ããã®ãã¤ãã
åãGoã®å¥½ããªã¨ããã®ã²ã¨ã¤ã¯ãDIYï¼Do It Yourselfï¼æããããã¨ããã§ãå§ããããããWrite&Runãæ軽ã ããæ¨æºã©ã¤ãã©ãªãç°¡æ½ãã¤å å®ãã¦ãããéä¸æ¬ãããã§ã¼ã³ã½ã¼ã¾ã§ãã£ã¦ãé ã«ã¯ãããã³ã¼ãã¼ããããçãªããã¼ã ã»ã³ã¿ã¼ã¿ãããªé°å²æ°ã好ãã§ãã
ã¨ãããã¨ã§ãæä½éããããã«ã¼ãã£ã³ã°ã¨ããã¨HTTPã®ãã£ã«ã¿ãªã³ã°ããããããå°ããªãã¼ã«ããããã¤ããã¾ããã
GitHub - otiai10/marmoset: The very minimum web toolkit, less than frame work
func main() { router := marmoset.NewRouter() router.GET("/hello", func(w http.ResponseWriter, r *http.Request) { w.Write([]byte("hello!")) }) http.ListenAndServe(":8080", router) }
Viewã®ã¬ã³ããªã³ã°ããããã§ããã©ããªããHerokuã§åããªãã¦ãã¢ããã³ãã«ã¬ã³ãã¼çã«åããªãã£ãã®ã§ocrserverå´ã§ã¯dirty hackãã¦ã¾ã...
ãã£ããã¨ãã®3: 1ã¨2ããUh...ï¼åããã
PPAPã®ç»åãã³ã©ã£ã¦è²¼ãããã¨æã£ããã ãã©ãã©ã¤ã»ã³ã¹ãã£ã«ã¿ä»ãã§ç»åæ¤ç´¢ããã1åãç¡ãã®ã
marmosetã§å°ãããµã¼ããã¤ãã£ã¦ãgosseractã§OCRããã¨ããããã£ãã®ãããã¡ãã«ãªãã¾ãã
ãã£ããã¨ãã®4: Herokuãããã¤ã§ããããã«ãã
ãªããHerokuã¯buildpackã£ã¦ããã®ããã£ã¦ãè¨ããªãã°Ubuntuãã¼ã¹ã®Cedarã£ã¦ããã¤ã¡ã¼ã¸ããã¼ã¹ã«ãèªåã§å¥½ããªæ¡å¼µããã¦ãããããã«ãªã£ã¦ãã£ã½ãã
ä¸è¨ã®goseractã¯å½ç¶ãGoã®ã³ã³ãã¤ã«ã«ã¯Tesseract-OCRã®ããããã¡ã¤ã«/å ±æãªãã¸ã§ã¯ããã¡ã¤ã«ãå¿ è¦ãªã®ã§ããããHerokuã¤ã³ã¹ã¿ã³ã¹ã«é ç½®ãã¦ããµã¼ãã¢ããªã±ã¼ã·ã§ã³ãç«ã¤ã¨ãã«åç §ã§ããããã«ãã¦ãããªãã¨ãããªããã ãã©ãã©ããããããªãã¨æã£ã¦ããç´ã§aptitudeã使ãããã¸ç¥ãªbuildpackããã£ãã®ã§ä¸çºè§£æ±ºãã¾ããã
- GitHub - heroku/heroku-buildpack-apt
- ãã®å¯¾å¿ã³ããã Prepare for Heroku · otiai10/ocrserver@9430e48 · GitHub
ããããã¶ãä»åã®ãã½ã§ãããTesseract-OCRå©ãããã±ã¼ã¸ã«ããwebã®ãã¼ã«ãããã«ããOCRãµã¼ãã®UIã«ãã¾ã ã¾ã TODOãå¤ãã®ã§ã使ããªãããã©ã·ã¢ãããã¦ããããã§ãã
ã¾ã¨ã
â ããæ¼ãããããªãã®OCRãµã¼ããããç«ã¡ã¾ããç¡æã§ã
å人éçºãããã§ããããç¥ããªãæè¡ã«è§¦ãããã£ã³ã¹ã«ãªãã®ã§ã
追è¨
@otiai10 人ã«è¨ãããã¾ã§ãã¸ã§ããã©ã¦ã¶ã«æç»ãããæåãOCRãããã±ã¼ã¹ã£ã¦ããããã ããããã£ã¦æã£ã¦ãèªåãæãããããªãã¨ã¯æ®éã¯ç¡ããã ã£ãã
— ã¨ãããªãªãã¤ã¼ãããã¾ã (@otiai10) December 5, 2016
ç¾å ´ããã¯ä»¥ä¸ã§ãã
- ä½è : ç¸æ¾¤æ©,arton,é³¥äºéª,ç¹ç°æ¬å
- åºç社/ã¡ã¼ã«ã¼: KADOKAWA/ã¢ã¹ãã¼ã»ã¡ãã£ã¢ã¯ã¼ã¯ã¹
- çºå£²æ¥: 2014/09/19
- ã¡ãã£ã¢: 大åæ¬
- ãã®ååãå«ãããã° (3件) ãè¦ã