ãGoogleãç®æãã¯ãããã¼ã¸ã®åã売ãããªãã¦ã·ã§ãããããããªã
ãã1å¹´ãåã«ãªããã©ãAmazonãããã¼ã¸ã®åã売ããå§ãã¾ãããã£ã¦çºè¡¨ãããこの記事ã®TBã¨ããè¦ãã°ãå½æã®åå¿ãåããã
ãã¾ããæ代ã¯ããã¼ã¸åä½ã§ããªãã
ã©ããªåä½ãªã®ãã£ã¦ãã¨ã¯ãこちらãã¿ã¦ãã
ããã«ã
Gregory Craneããã¯ã前にも紹介した論文ã§ã
As digital libraries mature and become better able to extract information (e.g., personal and place names), each word and automatically identifiable chunk of words becomes a discrete object. In a sample 300 volume, 55 million word collection of nineteenth-century American English, automatic named entity identification has added 12,000,000 tags. While this collection focuses on name rich historical materials and includes several reference works, this system already discovers thousands of references to named entities in most book length documents. We thus move from single catalogue entries with a few hundred words to thousands of tagged objects – an increase of at least one order of magnitude with named entities and of at least two orders of magnitude when we consider each individual word as an object.
ã£ã¦æ¸ãã¦ã¦ãè¦ããã«ãé»ååãé²ãã¨ãåèªã£ã¦ã®ãåä½ã«ãªããããã¼ãã£ã¨ãããããã¨ã«ãªããããããã«300åã®æ¬ã§ãã£ã¦ã¿ããã©ãåèªã¨ãã¦ã¯5500ä¸åèªãã£ã¦ããã®åèªã«èªåçã«ã¿ã°ä»ããããã1200ä¸ã¿ã°ãã§ããããã»ã»ã»1åã®æ¬ãæ¥ã«ã10åããã100åã®æ å ±éã«ãè¨å¼µããã£ã¦ãã¨ã ããããããã¨ãè¨ã£ã¦ããï¼ãã®ãããã¯ãä»åº¦è©³ããç´¹ä»ããäºå®ãï¼
ããã§ãä»ç´¹ä»ãã¦ãã"Image Coordinates"ã¯ããã®åèªãããåå²ãã¦ãæååä½ã§ã®ç®¡çãå¯è½ã«ãããããã
ã¨ããããã§ããä»ãã¿ããªãäºã£ã¦ããåä½ã¬ãã«ï¼"Granularity of objects"ï¼ãã£ã¦ã®ãã¡ããã¨ç解ãã¨ããã¨ããããããããã§ãå½é¢ãæåã¬ãã«ãè¶ ãããã¨ã¯ãªããããªã®ã§ãããããããããªååã»ãµã¼ãã¹ãã®æ¿æ¦åºã«ãªããããã
ãã ããã®è¨ãè¨ãããã®ã¯ããGoogleãAmazonãããããã¨èãã¦ããããã£ã¦å´ã®è©±ãããªãããããªãã¨ã¯ãGoogle好ããAmazon好ããªäººãªãããã£ã¨è©³ãã話ããã¦ãããã¯ãããã£ã¡ãããªãã¦ãã"Image Coordinates"ã£ã¦ãããªã«å¯è½æ§ãç§ãããããªã®ã«ãUCã«ããã¡ãããã ããã³ã£ããã ããããã¨å¼ãæãã«Googleãç²å¾ããã®ã¯ã»ã»ã»ãã£ã¦ãã¨ã