Skip to content

linkedin/datahub-gma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataHub

Build Status Gitter

DataHub

Introduction

DataHub is Linkedin's generalized metadata search & discovery tool. To learn more about DataHub, check out our Linkedin blog post and Strata presentation. This repository contains the complete source code to be able to build DataHub's frontend & backend services.

Quickstart

  1. Install docker and docker-compose.
  2. Clone this repo and make sure you are at the datahub branch.
  3. Run below command to download and run all Docker containers in your local:
cd docker/quickstart && docker-compose pull && docker-compose up --build
  1. After you have all Docker containers running in your machine, run below command to ingest provided sample data to DataHub:
./gradlew :metadata-events:mxe-schemas:build && cd metadata-ingestion/mce-cli && pip install --user -r requirements.txt && python mce_cli.py produce -d bootstrap_mce.dat

Note: Make sure that you're using Java 8, we have a strict dependency to Java 8 for build.

  1. Finally, you can start DataHub by typing http://localhost:9001 in your browser. You can sign in with datahub as username and password.

Quicklinks

Roadmap

  1. Add user profile page
  2. Deploy DataHub to Azure Cloud