新闻检索:爬虫定向采集3-4个网页,实现网页信息的抽取、检索和索引。网页个数不少于10个,能按时间、相关度、热度等属性进行排序,并实现相似主题的自动聚类。可以实现:有相关搜索推荐、snippet生成、结果预览(鼠标移到相关结果, 能预览)功能
-
Updated
Aug 2, 2016 - Python
新闻检索:爬虫定向采集3-4个网页,实现网页信息的抽取、检索和索引。网页个数不少于10个,能按时间、相关度、热度等属性进行排序,并实现相似主题的自动聚类。可以实现:有相关搜索推荐、snippet生成、结果预览(鼠标移到相关结果, 能预览)功能
fork of DCSS, moved to https://github.com/yrmvgh/yiufcrawl/
This java project is a multithreaded web crawler that uses three search engine, Bing, Yahoo, and Google to generate seeds to crawl the website.
Unofficial preservationist fork of DCSS
Add a description, image, and links to the crawl topic page so that developers can more easily learn about it.
To associate your repository with the crawl topic, visit your repo's landing page and select "manage topics."