The Research Elastic Stack (ELK)
To make it just as easy to analyze data as it is to collect it.
- Kafka: A distributed event streaming platform capable of handling trillions of events a day
- FileBeat: A lightweight single-purpose data shipper from Elastic
- Elasticsearch: A highly scalable search and analytics engine
- Logstash: A dynamic data collection pipeline with an extensible plugin ecosystem.
- Kibana: An analytics and visualization platform designed to work with Elasticsearch.
- ES-Hadoop: A library that allows Hadoop jobs (& therefore Spark) to interact with Elasticsearch.
- Spark: A fast and general-purpose cluster computing system. It provides high-level APIs in Scala, Python and R.
- GraphFrames: A package for Apache Spark which provides DataFrame-based Graphs.
- Jupyter Notebook: A web application that allows you to create interactive notebooks.
The only major modifications needed are:
-
Remove and replace the elasticsearch index templates
- Located in
RELK/elasticsearch/output_templates
- Located in
-
Remove and replace the logstash conf files
- Located in
RELK/logstash/pipelines
- Located in
-
Either add the files to analyze into
RELK/filebeat/input_files
or configure Kafka/FileBeat to ingest files for your use-case.
- Easy!
docker-compose up
If you'd like to have the containers running in the background:
docker-compose up -d
- Kafka listens on port 9092
- Kibana uses 5601 (Access it via localhost:5601)
- Jupyter uses 8888 (Access it via localhost:8888)
- By default, Jupyter notebooks password is 'research'. This can be changed in the docker-compose file
- Password protect ES/Kibana
- The inspiration: HELK -- The Hunting Elastic Stack
- Jupyer/Docker Stacks. An excellent repository with a ton of plug-and-play notebooks. It is incredible how easy it is to set up.
- Docker @ Elastic: Plug-and-play docker containers for Beats, Logstash, Elasticsearch, and Kibana.