- The document discusses using Lucene and Hadoop to build large-scale search engines. Lucene is introduced as a Java-based search engine library, while Hadoop is a framework for distributed storage and processing of large datasets.
- Code examples are provided for creating a Lucene index from files and for performing searches on the indexed data.
- The presentation agenda indicates it will cover Lucene, Hadoop, building search engines with Lucene+Hadoop, and questions. Reference materials on Lucene and Hadoop are also listed.
15. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buzz
Map
Data
Reduce Data
Shuffle
Map
Data
Reduce Data
Map
Data
16. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
Map
Data
doc1: foo
Reduce Data
doc1: foo
Shuffle
Map
Data
doc1: foo
Reduce
doc1: bar Data
Map
Data
doc1: bar
doc1: buz
17. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
doc1: foo
doc1: foo
Map
Data
Reduce Data
doc1: foo
doc1: bar
Shuffle
Map
Data
Reduce
doc1: bar Data
doc1: buz
Map
Data
18. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
doc1: foo foo: 1
doc1: foo foo: 1
Map
Data
Reduce Data
doc1: foo bar: 1
doc1: bar foo: 1
Map
Data
Reduce
doc1: bar bar: 1 Data
doc1: buz buz: 1
Map
Data
19. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
foo: 1
foo: 1
bar: <1, 1>
Map
Data
buz: <1>
Reduce Data
bar: 1
foo: 1
Map
Data foo: <1, 1, 1>
Reduce
bar: 1 Data
buz: 1
Map
Data
20. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
bar: <1, 1>
Map
Data
buz: <1>
Reduce Data
Map
Data foo: <1, 1, 1>
Reduce Data
Map
Data
21. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
bar: <1, 1> bar: 2
Map
Data
buz: <1> buz: 1
Reduce Data
Map
Data foo: <1, 1, 1> foo: 3
Reduce Data
Map
Data
22. 例: ワードカウント
入力文書: doc1
foo foo foo
bar bar buz
bar: 2
Map
Data
buz: 1
Reduce Data
Map
Data foo: 3
Reduce Data
Map
Data