Fully-functioning search engine built on top of Golang to satisfy HKUST COMP4321 requirements.It is built using Golang as its backend, and React as its frontend.
http://spaghetti-search.herokuapp.com/
- Implemented Topic-Sensitive PageRank (T. H. Haveliwala, 2003) with the use of query as the sole given context, and user's interest is equally reflected on every topic.
- Combination of PageRank and Vector-Space Model to rank the result
- Utilised anchor text and metatags suggested on Google's paper to increase precision and index much more webpages
- Make use of generator, future, and fan-in fan-out concurrency pattern in Golang to increase retrieval performance
- Dynamic document summary retrieval
- Use BadgerDB as database which optimised for SSD
- Support keyword list search and phrase search (use double quotes for phrase search)
- Install golang from here
$ sudo tar -C /usr/local -xzf go$VERSION.$OS-$ARCH.tar.gz
$ export PATH=$PATH:/usr/local/go/bin
- Download this repo using
go get
$ go get github.com/nwihardjo/SpaghettiSearch
- Install node and npm from here
- The build has been uploaded. No need to install node to get this running.
dep
is used as the package management to ensure the installed dependencies are the correct version from the correct vendor. Run dep ensure
on project root to install required packages, or run go get ./...
to same thing.
- Run
make
in the project root directory. It will install the necessary binary packages tobin/
directory, as well as install dependendcies - Run the crawler and specify the argument needed as below, then spin up the server. The backend and React server has been integrated, so that only one server by Golang needed to be started.
$ ./bin/start_crawl [-numPages=<number of pages to be crawled>] [-startURL=<starting entry point for the crawler to crawl>] [-domainOnly=<whether webpages to be crawled only in the domain of given starting URL)]
$ ./bin/server
- Head up to your browser, and go to
localhost:8080
. The server is hosted on port 8080, or check the output of your terminal.