Luceneã®New TokenStream API
ä»åã¯Luceneã®analysisããã±ã¼ã¸ã®New TokenStream APIãç´¹ä»ãã¾ããåç§°ã«ã¯Newã¨ã¤ãã¦ãã¾ããããã2年以ä¸åã®Lucene2.9ã§æä¾ãããAPIã§ã徿¥ã®ç¨éã¸ã®å½±é¿ãæå°éã«æãã¤ã¤ãããã¾ã§ã®ãã®ããæè»ãªæ¡å¼µãå¯è½ã¨ããçºã«æ°ããä½ããã¾ããããã®APIã«é¢ããè¨äºã¯関口さんもblogã§ãæ¸ããã¦ãã¾ãããLuceneのanalysisパッケージのJavadocï¼ãã¼ã¸çãä¸ãããããï¼ã«ãè©³ç´°ã«æ¸ããã¦ãã¾ããä»åã仿´ãªãããã®APIãç´¹ä»ããã®ã¯ãTokenizerãTokenFilterãã«ã¹ã¿ãã¤ãºãããã¯å©ç¨ããã®ã«ç¥ã£ã¦ããã¹ããã¤ã³ãã ããã¨ããã®ãããã®ã§ãããããããç§èªèº«ãTokenizeré¢é£ã®ã³ã¼ããèªã¿è§£ãã®ãé£ããã£ãçºãã¾ããããçè§£ããªããã°ãã¨æã£ãã®ãæ£ç´ãªã¨ããã§ãã
ãã®è¨äºã§ã¯ããã®APIã®Javadocã®å訳ï¼ã»ã¼Google翻訳ã»ã»ã»ï¼ã¨ããããå©ç¨ããScalaã®ã³ã¼ãã¨ãµã³ãã«ã³ã¼ããè¦ãªããé²ãã¦ããã¾ããåæã®è¨³æã¨ããã§ãªãæã¯æåè²ãå¤ãã¦åºå¥ã§ããããã«ãã¦ãã¾ãã使ãã³ã¼ãã¯å ¨ã¦<こちら>ã«ããã¾ããåãããªããè¦ã¦ããããã¨åããããããªãã¨æãã¾ããsbtãå¿ è¦ãªã®ã§ãå ¥ã£ã¦ãªãæ¹ã¯こちらã«å°å ¥æ¹æ³ãè¼ã£ã¦ãã¾ãã®ã§ãåèã«ãã¦å ¥ãã¦ã¿ã¦ä¸ãããjarå ¥ãã¦èµ·åã¹ã¯ãªããã«ãã¹ãéãã ããªã®ã§ç°¡åã§ããè¨äºã®è¨³ã«èª¤ããåããã«ããç®æãããããããã¨æãã®ã§ãææãã¦ããããã¨å¬ããã§ããã¾ããæ£ç¢ºããæ±ããå ´åã¯åæã®Javadocãèªãã§é ããããé¡ããã¾ããããã¨ããã®è¨äºã®å 容ã¯Luceneã®3.5æç¹ã®ãã®ã§ãã®ã§ããæ³¨æããããã§ã¯åé ã®ç´¹ä»ã®é¨åããå§ãã¾ãããã
New TokenStream API
Luceneã®2.9ããã®æ°ããTokenStream APIãç´¹ä»ãã¾ããããã¾ã§ã®APIã¯Tokenãçæããããã«ä½¿ç¨ããã¾ãããTokenã¯ä½ç½®å¢åã¨Termããã¹ãã®ãããªããã¾ãã¾ãªããããã£ã®getterã¡ã½ããã¨setterã¡ã½ãããæã£ã¦ãã¾ãããã®ã¢ããã¼ãã¯ãæ¨æºã®ã¤ã³ããã¯ã¹ä½æã®ãã©ã¼ãããã®ããã«ã¯ååã§ããããããã¯Flexible Indexingï¼ãã©ã¬ãã«ã§æ¡å¼µå¯è½ãªã«ã¹ã¿ã ã¤ã³ããã¯ã¹ãã©ã¼ãããã®çºã®Luceneã®Indexer使ãã¾ã¨ããç¨èªï¼ã®çºã«ã¯ååæ±ç¨çã§ã¯ããã¾ããã§ããã
å®å
¨ã«ã«ã¹ã¿ãã¤ãºå¯è½ãªIndexerã¯ãã¦ã¼ã¶ã¼ããã£ã¹ã¯ä¸ã«ã«ã¹ã¿ã ãã¼ã¿æ§é ãæ ¼ç´ã§ããããã«ãªããã¨ãæå³ãã¾ãããããã£ã¦APIã«ã¯ãææ¸ããIndexerã¸ã®ã«ã¹ã¿ã ãã¼ã¿ã転éãããã¨ãã§ããå¿
è¦ãããã¾ãã
âãããã¯Lucene 4ã§æä¾ãããäºå®ã§ããLUCENE-1458ãLUCENE-2111
Attributeã¨AttributeSource
ããã§Lucene2.9ãããAttributeã¨AttributeSourceã¨å¼ã°ããã¯ã©ã¹ã®æ°ãããã¢ãå°å ¥ãã¦ãã¾ããAttributeã¯ãã¼ã¯ã³æååã«é¢ããæ å ±ã®ç¹å®é¨åã¨ãã¦æ©è½ãã¾ããä¾ãã°ãTermAttributeã¯ããã¼ã¯ã³ã®ã¿ã¼ã ããã¹ããå«ã¾ãã¦ãããOffsetAttributeã«ã¯ããã¼ã¯ã³ã®éå§ã¨çµäºã®æåãªãã»ãããå«ã¾ãã¦ãã¾ãã AttributeSourceã¯ãå¶ç´ã®ããAttributeã®ã³ã¬ã¯ã·ã§ã³ã§ãåAttributeã®åã®ã¤ã³ã¹ã¿ã³ã¹ã¯1ã¤ã ãããå¯è½æ§ãããã¾ãã TokenStreamã¯AttributeSourceãæ¡å¼µããããã«ãªããããã¯TokenStreamã«Attributeã追å ã§ãããã¨ãæå³ãã¾ããTokenFilterãTokenStreamæ¡å¼µããã®ã§ããã¹ã¦ã®ãã£ã«ã¿ãAttributeSourceã§ãã
Luceneã¯ç¾å¨ãTokenã¯ã©ã¹ãæã£ã¦ãã夿°ãç½®ãæãããããã«ä½¿ããæ¬¡ã®6ã¤ï¼3.5ã§ã¯8ã¤ï¼ã®Attributeãæä¾ãã¦ãã¾ãï¼
Attributeå | å½¹å² |
---|---|
TermAttribute | ãã¼ã¯ã³ã®ã¿ã¼ã æååã3.1ããdeprecatedã§ä»å¾ã®ãªãªã¼ã¹ã§å»æ¢ãããäºå® |
OffsetAttribute | ãã¼ã¯ã³ã®éå§ï¼çµäºã®ãªãã»ãã |
PositionIncrementAttribute | ãã¼ã¯ã³ã®ä½ç½®å¢åæ å ± |
PayloadAttribute | ãã¼ã¯ã³ãä»»æã§æã¤äºã®åºæ¥ããã¤ãã¼ãï¼ãã¤ãé åã§ä¿åããä»»æã®ãã¼ã¿ï¼ |
TypeAttribute | ãã¼ã¯ã³ã®ç¨®é¡ãããã©ã«ãã¯'word' |
FlagsAttributeï¼â»ï¼ | ãã¼ã¯ã³ã«ç´ã¥ãããããã©ã°ãThis API is experimentalã¨ããã®ã§ãã¾ã ç©æ¥µçã«ä½¿ããªãæ¹ããããã |
CharTermAttribute | TermAttributeã®ä»£ããã«ããã使ãäº |
KeywordAttribute | Tokenããã¼ã¯ã¼ããã©ãã |
â»TermAttributeã®å®è£
ã®TermAttributeImplã¯ãAttributeFactoryã®ããã©ã«ãã®å®è£
ã§ã¯ãTermAttributeãæ¸¡ãããã¨CharTermAttributeImplãè¿ãããããã«ãªã£ã¦ãããæ¢ã«ä½¿ããªããªã£ã¦ãã¾ãã
æ°ããTokenStream APIã®å©ç¨
å¹ççã«æ°ããAPIãå©ç¨ããããã«ç¥ã£ã¦ããããã¤ãã®éè¦ãªç¹ãããã«ã¾ã¨ãã¾ããã¾ã以ä¸ã®ä¾ï¼Exampleã®ç« ï¼ãè¦ã¦ããã®å¾ããã®ã»ã¯ã·ã§ã³ã«æ»ã£ã¦ããã¨ããã§ãããã
-
- AttributeSourceã¯ç¹å®ã®Attributeã¤ã³ã¹ã¿ã³ã¹ã1ã¤ã ãæã¤ãã¨ãã§ãããã¨ãè¦ãã¦ããã¦ä¸ãããããã«ãTokenStreamã¨è¤æ°ã®TokenFilterã®ãã§ã¼ã³ã§ä½¿ç¨ããã¦ããå ´åããã§ã¼ã³ã®å
¨ã¦ã®TokenFilterã¯TokenStreamã®Attributeãå
±æãã¾ãã
âãã¹ãã³ã¼ãæ¸ãã¾ãããã¾ããAttributeã¯ç¨®é¡ãã¨ã«ã¤ã³ã¹ã¿ã³ã¹ãä¸ã¤ã ãããæã¦ãªãã¨ããç¹ããTokenizerã«åãAttributeãaddAttributeããæã«ãåãã¤ã³ã¹ã¿ã³ã¹ãè¿ããã¦ããäºã確èªãã¦ãã¾ããâ<確認用コード>
次ã«TokenStreamã®ãã§ã¼ã³å ã®Attributeãå ±æãããäºã§ãããããã¯Tokenizerã¨TokenFilterã®ãããããaddAttributeããå ´åã§ããåãã¤ã³ã¹ã¿ã³ã¹ãè¿ããã¦ããäºã確èªãã¦ãã¾ããâ<確認用のコード> - Attributeã®ã¤ã³ã¹ã¿ã³ã¹ã¯ãããã¥ã¡ã³ãã®ãã¹ã¦ã®ãã¼ã¯ã³ã§åå©ç¨ããã¦ãã¾ããå¾ã£ã¦ãTokenStream/TokenFilterã¯incrementToken()ã§é©åãªAttributeï¼sï¼ã«æ´æ°ããå¿
è¦ãããã¾ããæ¶è²»è
ï¼ä¸è¬çã«Luceneã®ã¤ã³ãã¯ãµã¼ï¼ã¯ãAttributeã®ãã¼ã¿ãæ¶è²»ããã¹ããªã¼ã ã®çµããã«å°éãããã¨ã示ãfalseãè¿ãããã¾ã§ãå度incrementToken()ãå¼ã³åºãã¾ããããã¯incrementToken()ã®åå¼ã³åºãã§TokenStream/TokenFilterã¯ãå®å
¨ã«Attributeã¤ã³ã¹ã¿ã³ã¹å
ã®ãã¼ã¿ã䏿¸ãã§ãããã¨ãæå³ãã¾ãã
â<このコード>ã§ç¢ºèªåºæ¥ãããã«ãTokenizerã¨TokenFilterã®incrementToken()ã§ãåãAttributeã¤ã³ã¹ã¿ã³ã¹ã使ããã¾ããTokenizerããTokenFilterã«Attributeãç´æ¥æ¸¡ãã®ã§ã¯ãªããããããã®å¦çãåæã«åç §ããã°è¯ãã¦ããããããæ´æ°ããã°è¯ãã¨ããä»çµã¿ã«ãªã£ã¦ãã¾ãã - ããã©ã¼ãã³ã¹ä¸ã®çç±ããTokenStream/TokenFilterã¯ãåæåä¸ã«Attributeãadd/getããå¿
è¦ãããã¾ããä¾ãã°ãã³ã³ã¹ãã©ã¯ã¿ã§Attributeã使ãã¦ã¤ã³ã¹ã¿ã³ã¹å¤æ°ã«åç
§ãä¿æãã¾ããincrementToken()ã§addAttribute()/getAttribute()ãå¼ã³åºã代ããã«ã¤ã³ã¹ã¿ã³ã¹å¤æ°ã使ç¨ããäºã§ãããã¥ã¡ã³ãã§ã®ãã¼ã¯ã³ãã¨ã®Attributeã«ãã¯ã¢ãããé¿ãããã¨ãã§ãã¾ãã
âTokenStream/TokenFilterã®ã³ã³ã¹ãã©ã¯ã¿ã§Attributeãaddãã¦ä½¿ããããincrementTokenã§æ¯åaddãã¦åç §ããªãã§ããããã©ã¼ãã³ã¹ãæªããªãããã¨ããäºã§ãã - AttributeSourceã®ãã¹ã¦ã®ã¡ã½ããã¯ã¹ãçã§ãä½åº¦å¼ã³åºãã¦ã常ã«åãçµæãå¾ããããã¨ãæå³ãã¾ããããã¯ãaddAttribute()ãç¥ãããã«ã¯ãç¹ã«éè¦ã§ãããã®ã¡ã½ããã¯å¼æ°ã¨ãã¦ãAttributeã®åï¼Classï¼ãåãåããã¤ã³ã¹ã¿ã³ã¹ãè¿ãã¾ããåãã¿ã¤ãã®Attributeã以å追å ããã¦ããå ´åã¯ããã§ã«åå¨ããã¤ã³ã¹ã¿ã³ã¹ãè¿ããããã§ãªãå ´åã¯æ°ããã¤ã³ã¹ã¿ã³ã¹ã使ããã¦è¿ããã¾ãããããã£ã¦TokenStreams/TokenFilterã¯ãåãã¢ããªãã¥ã¼ãã®åã§è¤æ°åaddAttribute()ãå®å
¨ã«å¼ã³åºããã¨ãã§ãã¾ããTokenStreamã®æ¶è²»è
ã¯ãé常getAttribute()ã®ä»£ããã«addAttribute()ãå¼ã³åºãå¿
è¦ãããã¾ãããªããªããTokenStreamãAttributeãä¿æãã¦ããªãå ´åã«å¤±æããããã§ãï¼getAttribute()ã¯ãAttributeãæ¬ è½ãã¦ããå ´åãIllegalArgumentExceptionãã¹ãã¼ããï¼ãããé«åº¦ãªã³ã¼ãã¯ãåã«hasAttribute()ã§ãã§ãã¯ãããã¨ãã§ããããTokenStreamãä¿æãã¦ãããªããæ¡ä»¶ä»ãã§ç¹å¥ãªããã©ã¼ãã³ã¹ãå¾ãããã®å¦çãçç¥ãããã¨ãã§ãã¾ãã
âaddAttributeãä½åå¼ã³åºãã¦ãåãçµæãè¿ãã¨ããç¹ã¯ã1ã®ç¢ºèªç¨ã³ã¼ãã§ã¤ã³ã¹ã¿ã³ã¹ãä¸ã¤ããæã¦ãªãäºãæ¢ã«ç¢ºèªãã¾ãããããã®ãã¤ã³ãã¯ãAttributeãåå¾ãããå ´åã¯ãé常addAttributeã使ãã¨ããäºã§ããgetAttributeã§ã¯ããã¾ããã使ãäºã¯åºæ¥ã¾ãããAttributeãç»é²ããã¦ãªãå ´åã¯ä¾å¤ãæãããã¾ããâ<確認用コード>
- AttributeSourceã¯ç¹å®ã®Attributeã¤ã³ã¹ã¿ã³ã¹ã1ã¤ã ãæã¤ãã¨ãã§ãããã¨ãè¦ãã¦ããã¦ä¸ãããããã«ãTokenStreamã¨è¤æ°ã®TokenFilterã®ãã§ã¼ã³ã§ä½¿ç¨ããã¦ããå ´åããã§ã¼ã³ã®å
¨ã¦ã®TokenFilterã¯TokenStreamã®Attributeãå
±æãã¾ãã
Example
ãã®ä¾ã§ã¯ãWhiteSpaceTokenizerã使ãã2æå以ä¸ã®ãã¹ã¦ã®åèªãæå¶ããLengthFilterã使ç¨ãã¾ããLengthFilterã¯Luceneã³ã¢ã®ä¸é¨ã§ãããæ°ããTokenStream APIã®ä½¿ç¨æ³ãä¾ç¤ºããããã«ããã§èª¬æããã¾ãã
ãã®å¾ãã«ã¹ã¿ã AttributeãPartOfSpeechAttributeãéçºãããã§ã¼ã³ã«æ°ããã«ã¹ã¿ã Attributeãå©ç¨ããå¥ã®ãã£ã«ã¿ã追å ãããããPartOfSpeechTaggingFilterãå¼ã³åºãã¾ãã
Whitespace tokenization
ã¨ããäºã§ãã¾ãæåã¯WhitespaceTokenizerã使ã£ãã ãã®Analyzerãä½ãã¾ããâ<コード>
ãã®Analyzerã¯tokenStreamã¡ã½ããã§ãWhitespaceTokenizerãè¿ãã ãã§ããã¾ãã¯ããã使ã£ã¦Attributeã®ã¤ã³ã¹ã¿ã³ã¹ãå©ç¨ãã¦ã¿ã¾ããããæåã«ä¸ã®Analyzerã®ã¤ã³ã¹ã¿ã³ã¹ãä½ã£ã¦ãtokenStreamã¡ã½ãããå¼ã³åºãã¾ããâ<コード>
æ¬¡ã«ææ¸ãæ£ããåTokenã¨ãã¦å¦çããã¦ããã確èªãã¾ããâ<確認用コード>
Tokenã®å
容ã¯åæã§ã¯æ¨æºåºåã«åºåãã¦ãã¾ãããããã®è¨äºã§ã¯ScalaTestã使ã£ã¦çµæã確èªãã¦ãã¾ããã¾ããåæã¯TermAttributeã使ã£ã¦ãã¾ããã3.5æç¹ã§deprecatedãªã®ã§ããã®è¨äºã§ã¯CharTermAttributeã使ã£ã¦ãã¾ãã15è¡ç®ã§CharTermAttributeã®ã¤ã³ã¹ã¿ã³ã¹ãåå¾ãã¦ã20è¡ç®ã§å©ç¨ãã¦ãã¾ããCharTermAttributeã§ã¯ãç´æ¥æååãåå¾ããã¡ã½ãããdeprecatedãªã®ã§ãbuffer()ã¨length()ãStringã®ã³ã³ã¹ãã©ã¯ã¿ã§ä½¿ã£ã¦æååãåå¾ãã¾ããlength()ãæå®ããªãã¨ãCharTermAttributeã¯charã®é
åã使ãåãé¢ä¿ä¸ãä¸çªé·ãToken以å¤ã¯ãå¾ãã«ä»ã®Tokenã®æåãå
¥ã£ã¦ãã¾ãã®ã§ã注æãå¿
è¦ã§ãã
çµæã¯åæã¨åæ§ã«ã¹ãã¼ã¹ã§åºåãããTokenã"This", "is", "a", "demo", "of", "the", "new", "TokenStream", "API"ãã¨ãªãã¾ãã
Adding a LengthFilter
2æå以ä¸ã®å
¨ã¦ã®Tokenãåºåããªãããã«ãã¾ããLengthFilterãchainã«è¿½å ãããã¨ã§å®¹æã«å®ç¾ã§ãã¾ãããã£ãä½ã£ãAnalyzerã®tokenStream()ã¡ã½ããã®ã¿ã夿´ããå¿
è¦ãããã¾ãã
ã¨ããäºã§ãWhitespaceTokenizerãLengthFilterã®ã³ã³ã¹ãã©ã¯ã¿ã«æ¸¡ãã¦chainã«ãã¾ãããâ<確認用コード>
LengthFilterã¯è¨±å¯ããæåæ°ãæå®ããããã«ãªã£ã¦ãããããã§ã¯2æå以ä¸ã®Tokenãé¤å¤ãããã®ã§ã3ãInteger.MAX_VALUEãæå®ãã¾ãã
â»åæã®LengthFilterã®ã³ã³ã¹ãã©ã¯ã¿ã®å¼æ°ã¯3ã¤ã§ããããã®ã³ã³ã¹ãã©ã¯ã¿ã¯deprecatedã«ãªã£ã¦ãã¾ãã第ä¸å¼æ°ã追å ããã¦ãããããã«ãããPositionIncrementAttributeãé©åã«è¨é²ãããã©ãããå¤ããã¾ããåæã§ã¯ä½ç½®å¢åã®æ
å ±ã¯ä½¿ã£ã¦ããªãã®ã§ä¸è¦ãªã®ã§ãããæè§ãªã®ã§ã©ã®ããã«åãã¦ãããã確èªããããã«ããã®è¨äºã§ã¯trueã«ãã¦ããã¾ãã
ã§ã¯ããã<利用するコード>ãè¦ã¾ãããã
æåã®ãµã³ãã«ã¨ã®éãã¯PositionIncrementAttributeã使ã£ã¦ä½ç½®å¢åã確èªããããã«ãã¦ããã¨ããã§ããçµæã¯2æå以ä¸ã®Tokenãåããã¦ãList( ("This",1), ("demo",3), ("the",2), ("new",1), ("TokenStream",1), ("API",1) )ã¨åãã«ãªãã¾ããåãããæ°ãPositionIncrementAttributeã§ç¢ºèªã§ãã¾ãã
åæã§ã¯ãã®å¾ã§LengthFilterã®ã³ã¼ããã®ã£ã¦ãã¾ããã3.5æç¹ã®<LengthFilter>ã§ã¯ä¸»ãªå®è£
ã<FilteringTokenFilter>ã«ç§»åãã¦ãã¾ã£ã¦ããã®ã¨ãåæã§ç¢ºèªãã¦ããTokenizerã¨TokenFilterã§ã®Attributeã¤ã³ã¹ã¿ã³ã¹ã®å
±æã¯ãUsing New TokenStream APIã®ç« ã§æ¢ã«ç¢ºèªãã¦ããã®ã§ããã®è¨äºã§ã¯çç¥ãã¾ããæ¯è¼çç°¡åãªã³ã¼ããªã®ã§ãèå³ã®ããæ¹ã¯ãªã³ã¯ãã¯ã£ã¦ãã¾ãã®ã§ãã¡ãã§ç¢ºèªãã¦ã¿ã¦ä¸ããã
Adding a custom Attribute
次ã«ãåè©ã®ã¿ã°ä»ãã®ããã«ãPartOfSpeechAttributeã¨ããç¬èªã®ã«ã¹ã¿ã Attributeãå®è£
ãã¾ããã¾ããæ°ããAttributeã®ã¤ã³ã¿ã¼ãã§ã¼ã¹ãå®ç¾©ããå¿
è¦ãããã¾ãã
ã¨ããäºã§traitã¨PartOfSpeechã¯ã©ã¹ã¨é¢é£ã¯ã©ã¹ãä½ãã¾ãããã¾ãåè©ã®åæåã¯scalaã®Enumerationã使ã£ã¦å®ç¾©ãã¦ãã¾ããâ<コード>
ã¤ã³ã¿ã¼ãã§ã¼ã¹ã¯åæã®ã³ã¼ãã¨ã»ã¼åãã§ããâ<コード>
次ã«å®è£
ã¯ã©ã¹ãè¨è¿°ããå¿
è¦ãããã¾ããã¯ã©ã¹ã®åç§°ã¯éè¦ã§ãï¼ããã©ã«ãã§ã¯Luceneã¯ãæ¥å°¾è¾ã"Impl"ã¨ããåç§°ã®ã¯ã©ã¹ï¼ã¤ã³ã¿ã¼ãã§ã¼ã¹åImplï¼ããããã©ããããã§ãã¯ãã¾ãããã®ä¾ã§ã¯ããããå®è£
ããã¯ã©ã¹ã®PartOfSpeechAttributeImplãå¼ã³åºãã¾ãã
ããã¯é常ã®åä½ã§ãããããããããã®å½åè¦åã夿´ãããã¨ãã§ããexpert-APIï¼AttributeSource.AttributeFactoryï¼ãããã¾ããFactoryã§ã¯ã弿°ã¨ãã¦Attributeã¤ã³ã¿ãã§ã¼ã¹ãåãåããå®éã®ã¤ã³ã¹ã¿ã³ã¹ãè¿ãã¾ããããã©ã«ãã®åä½ã夿´ããå¿
è¦ãããå ´åã«ã¯ç¬èªã®ãã¡ã¯ããªãå®è£
ãããã¨ãã§ãã¾ãã
â»ããã«æ¸ããã¦ããããã«ç¬èªã®AttributeFactoryãç¨æããã°ãæ¢åã®å®è£
ã®å¤æ´ï¼ç¹å®ããã±ã¼ã¸ãåªå
çã«æ¢ãã¨ãï¼ãå¯è½ã§ãããããããã«ã¯Tokenizerã«ç¬èªAttributeFactoryã®ã¤ã³ã¹ã¿ã³ã¹ã渡ãããã«ä¿®æ£ããªããã°ãããªãã¦ãsolrãã使ããªããå¤ããã種é¡ãã¹ã¦ã®TokenizerFactoryãç¬èªã«ä½ãå¿
è¦ãããããã§ããããã
â<実装クラスのPartOfSpeechAttributeImplのコード>
ããã¯ãåç´ãªAttributeã®å®è£ ã§ãTokenã®åè©ãæ ¼ç´ããã ãåä¸ã®å¤æ°ãæã£ã¦ãã¾ããããã¯ãæ°ããAttributeImplã¯ã©ã¹ãæ¡å¼µããæ½è±¡ã¡ã½ããã®copyTo()ãequals()ãhashCode()ãclear()ãå®è£ ãã¦ãã¾ããããããã®ãã¼ã¯ã³ã«ãã®æ°ããPartOfSpeechAttributeãè¨å®ãããã¨ãã§ããTokenFilterãå¿ è¦ã§ãããã®ä¾ã§ã¯ã1æåç®ã大æåã®åèªã'Noun'ã¨ãã¦ããã以å¤ã'Unknown'ã¨ãããé常ã«åç´ãªãã£ã«ã¿ï¼PartOfSpeechTaggingFilterï¼ãè¦ãã¾ããâ<PartOfSpeechTaggingFilterのコード>
LengthFilterï¼2.9彿ï¼ã®ããã«ããã®æ°ãããã£ã«ã¿ã¯ãã³ã³ã¹ãã©ã¯ã¿ã§å¿
è¦ãªAttributeã«ã¢ã¯ã»ã¹ããã¤ã³ã¹ã¿ã³ã¹å¤æ°ã«åç
§ãè¨å®ãã¾ããããªãã¯æ°ãã屿§ã®ã¤ã³ã¿ã¼ãã§ã¼ã¹ã渡ãå¿
è¦ãããã ãã§ãæ£ããã¯ã©ã¹ãèªåçã«è§£æ±ºããã¦ã¤ã³ã¹ã¿ã³ã¹åãããã¨ã«æ³¨æãã¦ãã ãããæ¬¡ã«chainã«ãã£ã«ã¿ã追å ããå¿
è¦ãããã¾ãï¼
â<フィルタを追加したコード>
ä¸çªå¤å´ã«ä»åä½ã£ãFilterã追å ãã¦ãã¾ããæ©éåä½ç¢ºèªãã¦ã¿ã¾ããããâ
â<確認用コード>
åºåå
容ã«åè©æ
å ±ã追å ãã¦ç¢ºèªãã¦ã¿ã¾ããçµæã¯ãList( ("This",Noun,1), ("demo",Unknown,3), ("the",Unknown,2), ("new",Unknown,1), ("TokenStream",Noun,1), ("API",Noun,1) )ã«çãããªãã¾ãã
åè©ã¿ã°ä»ãã®æ¹å
ååèªã¯ãç¾å¨ãå²ãå½ã¦ãããPartOfSpeechã¿ã°ãç¶ãã¾ãããã¡ãããããã¯ç´ æ´ãªåè©ã®ã¿ã°ä»ãã§ããåèª'This'ã«ãåè©ã¨ãã¦ã¿ã°ä»ããããã¹ãã§ã¯ãªããããã¯æã®æåã®èªã§ããã ãã§å¤§æåã§ç¶´ããã¦ãã¾ããå®éã«ããã¯ç·´ç¿ã®ããã®è¯ãæ©ä¼ã§ããæ°ããAPIã®ä½¿ãæ¹ãç·´ç¿ããçºã«ãèªè ã¯æ¢ã«ãæã®æåã®ãã¼ã¯ã³ãã©ããã®å ´åã«ãååèªã«æå®ã§ããAttributeã¨TokenFilterãæ¸ããã¨ãã§ãã¾ãããã®å¾PartOfSpeechTaggingFilterã¯ããã®ç¥èãå©ç¨ãããã¨ãã§ãã¦ãæç« ã®æåã®åèªã§ãªãå ´åã«ã®ã¿ã大æåããå§ã¾ãåèªãåè©ã¨ãã¦ã¿ã°ä»ããããã¨ãã§ãã¾ãï¼ãããæ£ããåä½ã§ã¯ãªãã®ã¯åãã£ã¦ããããããã¯è¯ãç·´ç¿ã§ãï¼ã
ãã®ãã¨åæã«ã¯FirstTokenOfSentenceAttributeImplãä½ããã¨ãããã³ããããã®ã§ããããåèã«<FirstTokenOfSentenceAttributeインターフェース>ã¨<実装クラス>ãããã¦ãããã«å¤ãè¨å®ãã<FirstTokenOfSentenceFilter>ãä½ã£ã¦ã¿ã¾ããããæ°ããä½ã£ãAttributeã®ã¤ã³ã¿ã¼ãã§ã¼ã¹ã¨å®è£ ã¯ãæåã®åèªãã©ããã®å¤ãä¿æããã ãã®å½¹å²ãªã®ã§ãé£ãããªãã¨æãã¾ããããã¦ãå ã»ã©ã®PartOfSpeechTaggingFilterãããã®Attributeã使ã£ã¦ãæç« ã®æåã®åèªã®å ´åã«ã¯åè©ã¨ããªãããã«ä¿®æ£ãã¦ã¿ã¾ããããä¿®æ£ããã³ã¼ãã<こちら>]>ã§ããPartOfSpeechTaggingFilterãç¶æ¿ãã¦ãåè©ã®å¤å®é¨åã®ã¿overrideãã¦ãã¾ããããã«ãããã使ãããã«ããAnalyzerã<こちら>ã§ããå ã«FirstTokenOfSentenceAttributeãè¨å®ãããããã«ãçµã¿è¾¼ãé çªãPartOfSpeechTaggingFilter2ããåã«ãã¦ãã¾ããé çªãå ¥ãæ¿ããã¨æ£å¸¸ã«åä½ãã¾ããã®ã§æ°ãã¤ãã¦ä¸ããã
æå¾ã«<動作確認用のコード>ã§ããå
ã»ã©ã®Analyzerã使ã£ã¦ãæç« ã®æåã®åèªã®'This'ãNounã¨ãªããã«Unknownã¨ãªãã¾ããã
èªãã§ã¿ã¦
æåã¯ã¨ã«ãããã¯ã©ã¹éã§Attributeãã©ã®ããã«å©ç¨ãã¦ããããåããã¥ãããçè§£ãé£ããã£ãã®ã§ãããå®éã«åããã¦ã¿ã¦ããTokenizerãTokenFilterã®ã³ã¼ããè¦ã¦ã¿ãã¨ãéåçè§£ãããããªãã¾ãããã¾ããæ°ããAPIã§ã³ã¼ãã®è¦éããæªããªã£ãåé¢ãæ¢åã³ã¼ãã«å½±é¿ãä¸ããã«æ¡å¼µãåºæ¥ãããã«èããããä½ãã«ãªã£ã¦ããäºãåããã¾ããæè§å¦ãã ã®ã§ãã¾ã ä½ãæãã¤ãã¦ãªãã§ãããä½ãæãã¤ãããç¬èªã®Attributeã¨Filterãä½ã£ã¦ã¿ã¦ãã¾ãç´¹ä»ãããã¨æãã¾ãã
次åã¯ä»å詳細ã«è¦ãªãã£ãAttributeSourceã®ã³ã¼ããè¦ãªãããããç´°ããã¨ããã«ã¤ãã¦è¦ã¦ããäºå®ã§ããï¼è¨äºã¯ãã£ã¨ãã£ã¨çããã¾ãã»ã»ã»ï¼