ã€€GoogleãŒç›®æŒ‡ã™ã¯ã€ã€Œãƒšãƒ¼ã‚¸ã®åˆ‡ã‚Šå£²ã‚Šã€ãªã‚“ã¦ã‚·ãƒ§ãƒœã„ã‚‚ã‚“ã˜ã‚ƒãªã„

ã‚‚ã†1å¹´ã‚‚å‰ã«ãªã‚‹ã‘ã©ã€AmazonãŒã€Œãƒšãƒ¼ã‚¸ã®åˆ‡ã‚Šå£²ã‚Šã‚’å§‹ã‚ã¾ã™ã‚ˆã€ã£ã¦ç™ºè¡¨ã—ãŸã€‚この記事ã®TBã¨ã‹ã‚’è¦‹ã‚Œã°ã€å½“æ™‚ã®åå¿œãŒåˆ†ã‹ã‚‹ã€‚

ã„ã¾ã‚„ã€æ™‚ä»£ã¯ã€ãƒšãƒ¼ã‚¸å˜ä½ã§ã‚‚ãªã„ã€‚
ã©ã‚“ãªå˜ä½ãªã®ã‹ã£ã¦ã“ã¨ã¯ã€こちらã‚’ã¿ã¦ãã€‚

ã•ã‚‰ã«ã€

Gregory Craneã•ã‚“ã¯ã€前にも紹介した論文ã§ã€

As digital libraries mature and become better able to extract information (e.g., personal and place names), each word and automatically identifiable chunk of words becomes a discrete object. In a sample 300 volume, 55 million word collection of nineteenth-century American English, automatic named entity identification has added 12,000,000 tags. While this collection focuses on name rich historical materials and includes several reference works, this system already discovers thousands of references to named entities in most book length documents. We thus move from single catalogue entries with a few hundred words to thousands of tagged objects – an increase of at least one order of magnitude with named entities and of at least two orders of magnitude when we consider each individual word as an object.

ã£ã¦æ›¸ã„ã¦ã¦ã€è¦ã™ã‚‹ã«ã€Œé›»ååŒ–ãŒé€²ã‚€ã¨ã€å˜èªžã£ã¦ã®ãŒå˜ä½ã«ãªã‚‹ã‚ˆã€‚ããƒ¼ã™ã£ã¨ã€ã™ã”ã„ã“ã¨ã«ãªã‚‹ã‚ˆã€‚ãŸã‚ã—ã«300å†Šã®æœ¬ã§ã‚„ã£ã¦ã¿ãŸã‘ã©ã€å˜èªžã¨ã—ã¦ã¯5500ä¸‡å˜èªžã‚ã£ã¦ã€ãã®å˜èªžã«è‡ªå‹•çš„ã«ã‚¿ã‚°ä»˜ã‘ã•ã›ãŸã‚‰1200ä¸‡ã‚¿ã‚°ã‚‚ã§ããŸã‚ˆã€‚ãƒ»ãƒ»ãƒ»1å†Šã®æœ¬ãŒæ€¥ã«ã€10å€ã€ã„ã‚„100å€ã®æƒ…å ±é‡ã«ã€è†¨å¼µã™ã‚‹ã£ã¦ã“ã¨ã ã‚ˆã€ã‚‰ã—ãã“ã¨ã‚’è¨€ã£ã¦ã‚‹ã€‚ï¼ˆã“ã®ã‚ãŸã‚Šã¯ã€ä»Šåº¦è©³ã—ãç´¹ä»‹ã™ã‚‹äºˆå®šã€‚ï¼‰

ãã‚“ã§ã€ä»Šç´¹ä»‹ã—ã¦ã„ã‚‹"Image Coordinates"ã¯ã€ãã®å˜èªžã•ãˆã‚‚åˆ†å‰²ã—ã¦ã€æ–‡å—å˜ä½ã§ã®ç®¡ç†ã‚’å¯èƒ½ã«ã™ã‚‹ã‚ã‘ã•ã€‚

ã¨ã„ã†ã‚ã‘ã§ã€ã€Œä»Šã€ã¿ã‚“ãªãŒäº‰ã£ã¦ã„ã‚‹å˜ä½ãƒ¬ãƒ™ãƒ«ï¼ˆ"Granularity of objects"ï¼‰ã€ã£ã¦ã®ã‚’ã¡ã‚ƒã‚“ã¨ç†è§£ã—ã¨ã‹ã‚“ã¨ã„ã‘ã‚“ã‚ã‘ã€‚ãã‚“ã§ã€å½“é¢ã€æ–‡å—ãƒ¬ãƒ™ãƒ«ã‚’è¶…ãˆã‚‹ã“ã¨ã¯ãªã•ãã†ãªã®ã§ã€ãã“ãŒã€Œã„ã‚ã‚“ãªå•†å“ãƒ»ã‚µãƒ¼ãƒ“ã‚¹ã€ã®æ¿€æˆ¦åŒºã«ãªã‚‹ã‚ã‘ã•ã€‚

ãŸã ã€ã“ã®è¨˜ãŒè¨€ã„ãŸã„ã®ã¯ã€ã€ŒGoogleã‚‚Amazonã‚‚ã™ã”ã„ã“ã¨è€ƒãˆã¦ã‚‹ãã€œã€ã£ã¦å´ã®è©±ã˜ã‚ƒãªã„ã€‚ãã‚“ãªã“ã¨ã¯ã€Googleå¥½ãã‚„Amazonå¥½ããªäººãªã‚‰ã€ã‚‚ã£ã¨è©³ã—ãè©±ã‚’ã—ã¦ãã‚Œã‚‹ã¯ãšã€‚ãã£ã¡ã˜ã‚ƒãªãã¦ã€ã€Œ"Image Coordinates"ã£ã¦ãã‚“ãªã«å¯èƒ½æ€§ã‚’ç§˜ã‚ãŸã‚‚ã‚“ãªã®ã«ã€UCã«ã‚ã’ã¡ã‚ƒã†ã‚“ã ãã€ã³ã£ãã‚Šã ãã€ãã‚Œã¨å¼•ãæ›ãˆã«GoogleãŒç²å¾—ã—ãŸã®ã¯ãƒ»ãƒ»ãƒ»ã€ã£ã¦ã“ã¨ã€‚