Skip to content

Commit 165d4ae

Browse files
author
Kerem Sahin
committed
Documentation update part-1
1 parent 1b46145 commit 165d4ae

39 files changed

+610
-89
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# Data Hub
1+
# DataHub
22
[![Build Status](https://travis-ci.org/linkedin/WhereHows.svg?branch=datahub)](https://travis-ci.org/linkedin/WhereHows)
33
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/linkedin/datahub)
44

5-
![Data Hub](docs/imgs/datahublogo.png)
5+
![DataHub](docs/imgs/datahub-logo.png)
66

77
## Introduction
8-
Data Hub is Linkedin's generalized metadata search & discovery tool. To learn more about Data Hub, check out our
9-
[Linkedin blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). This repository contains the complete source code to be able to build Data Hub's frontend & backend services.
8+
DataHub is Linkedin's generalized metadata search & discovery tool. To learn more about DataHub, check out our
9+
[Linkedin blog post](https://engineering.linkedin.com/blog/2019/data-hub) and [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019). This repository contains the complete source code to be able to build DataHub's frontend & backend services.
1010

1111
## Quickstart
1212
1. Install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
@@ -15,13 +15,13 @@ Data Hub is Linkedin's generalized metadata search & discovery tool. To learn mo
1515
```
1616
cd docker/quickstart && docker-compose pull && docker-compose up --build
1717
```
18-
4. After you have all Docker containers running in your machine, run below command to ingest provided sample data to Data Hub:
18+
4. After you have all Docker containers running in your machine, run below command to ingest provided sample data to DataHub:
1919
```
2020
./gradlew :metadata-events:mxe-schemas:build && cd metadata-ingestion/mce-cli && pip install --user -r requirements.txt && python mce_cli.py produce -d bootstrap_mce.dat
2121
```
2222
Note: Make sure that you're using Java 8, we have a strict dependency to Java 8 for build.
2323

24-
5. Finally, you can start `Data Hub` by typing `http://localhost:9001` in your browser. You can sign in with `datahub`
24+
5. Finally, you can start `DataHub` by typing `http://localhost:9001` in your browser. You can sign in with `datahub`
2525
as username and password.
2626

2727
## Quicklinks
@@ -33,4 +33,4 @@ as username and password.
3333

3434
## Roadmap
3535
1. Add user profile page
36-
2. Deploy Data Hub to [Azure Cloud](https://azure.microsoft.com/en-us/)
36+
2. Deploy DataHub to [Azure Cloud](https://azure.microsoft.com/en-us/)

datahub-frontend/README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
1-
# Data Hub Frontend
2-
Data Hub frontend is a [Play](https://www.playframework.com/) service written in Java. It is served as a mid-tier
3-
between [Data Hub GMS](../gms) which is the backend service and [Data Hub UI](../datahub-web).
1+
# DataHub Frontend
2+
DataHub frontend is a [Play](https://www.playframework.com/) service written in Java. It is served as a mid-tier
3+
between [DataHub GMS](../gms) which is the backend service and [DataHub UI](../datahub-web).
44

55
## Pre-requisites
66
* You need to have [JDK8](https://www.oracle.com/java/technologies/jdk8-downloads.html)
7-
installed on your machine to be able to build `Data Hub Frontend`.
7+
installed on your machine to be able to build `DataHub Frontend`.
88
* You need to have [Chrome](https://www.google.com/chrome/) web browser
99
installed to be able to build because UI tests have a dependency on `Google Chrome`.
1010

1111
## Build
12-
`Data Hub Frontend` is already built as part of top level build:
12+
`DataHub Frontend` is already built as part of top level build:
1313
```
1414
./gradlew build
1515
```
16-
However, if you only want to build `Data Hub Frontend` specifically:
16+
However, if you only want to build `DataHub Frontend` specifically:
1717
```
1818
./gradlew :datahub-frontend:build
1919
```
2020

2121
## Dependencies
22-
Before starting `Data Hub Frontend`, you need to make sure that [Data Hub GMS](../gms) and
22+
Before starting `DataHub Frontend`, you need to make sure that [DataHub GMS](../gms) and
2323
all its dependencies have already started and running.
2424

2525
Also, user information should already be registered into the DB,
@@ -42,7 +42,7 @@ python metadata-ingestion/mce_cli.py produce
4242
This will create a default user with username `datahub`. You can sign in to the app using `datahub` as your username.
4343

4444
## Start via Docker image
45-
Quickest way to try out `Data Hub Frontend` is running the [Docker image](../docker/frontend).
45+
Quickest way to try out `DataHub Frontend` is running the [Docker image](../docker/frontend).
4646

4747
## Start via command line
4848
If you do modify things and want to try it out quickly without building the Docker image, you can also run
@@ -51,7 +51,7 @@ the application directly from command line after a successful [build](#build):
5151
cd datahub-frontend/run && ./run-local-frontend
5252
```
5353

54-
## Checking out Data Hub UI
54+
## Checking out DataHub UI
5555
After starting your application in one of the two ways mentioned above, you can connect to it by typing below
5656
into your favorite web browser:
5757
```

docker/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# Docker Images
2-
The easiest way to bring up and test Data Hub is using Data Hub [Docker](https://www.docker.com) images
2+
The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images
33
which are continuously deployed to [Docker Hub](https://hub.docker.com/u/keremsahin) with every commit to repository.
44

55
* [**datahub-gms**](gms): [![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-gms)](https://cloud.docker.com/repository/docker/keremsahin/datahub-gms/)
66
* [**datahub-frontend**](frontend): [![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-frontend)](https://cloud.docker.com/repository/docker/keremsahin/datahub-frontend/)
77
* [**datahub-mce-consumer**](mce-consumer): [![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-mce-consumer)](https://cloud.docker.com/repository/docker/keremsahin/datahub-mce-consumer/)
88
* [**datahub-mae-consumer**](mae-consumer): [![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-mae-consumer)](https://cloud.docker.com/repository/docker/keremsahin/datahub-mae-consumer/)
99

10-
Above Docker images are created for Data Hub specific use. You can check subdirectories to check how those images are
10+
Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are
1111
generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or
12-
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, Data Hub depends
12+
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends
1313
on below Docker images to be able to run:
1414
* [**Kafka and Schema Registry**](kafka)
1515
* [**Elasticsearch**](elasticsearch)
@@ -23,5 +23,5 @@ The pipeline depends on all the above images composing up.
2323
You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
2424

2525
## Quickstart
26-
If you want to quickly try and evaluate Data Hub by running all necessary Docker containers, you can check
26+
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check
2727
[Quickstart Guide](quickstart).

docker/elasticsearch/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Elasticsearch & Kibana
22

3-
Data Hub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for Data Hub.
3+
DataHub uses Elasticsearch as a search engine. Elasticsearch powers search, typeahead and browse functions for DataHub.
44
[Official Elasticsearch Docker image](https://hub.docker.com/_/elasticsearch) found in Docker Hub is used without
55
any modification.
66

77
## Run Docker container
8-
Below command will start the Elasticsearch and Kibana containers. `Data Hub` uses Elasticsearch release `5.6.8`. Newer
8+
Below command will start the Elasticsearch and Kibana containers. `DataHub` uses Elasticsearch release `5.6.8`. Newer
99
versions of Elasticsearch are not tested and you might experience compatibility issues.
1010
```
1111
cd docker/elasticsearch && docker-compose pull && docker-compose up --build
@@ -26,7 +26,7 @@ ports:
2626
```
2727

2828
### Docker Network
29-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
29+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
3030
If you change this, you will need to change this for all other Docker containers as well.
3131
```
3232
networks:

docker/frontend/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Data Hub Frontend Docker Image
1+
# DataHub Frontend Docker Image
22
[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-frontend)](https://cloud.docker.com/repository/docker/keremsahin/datahub-frontend/)
33

4-
Refer to [Data Hub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
5-
responsibility of this service for the Data Hub.
4+
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
66

77
## Build
88
```
@@ -28,7 +28,7 @@ ports:
2828
```
2929

3030
#### Docker Network
31-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
31+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
3232
If you change this, you will need to change this for all other Docker containers as well.
3333
```
3434
networks:
@@ -47,7 +47,7 @@ environment:
4747
```
4848
The value of `DATAHUB_GMS_HOST` variable should be set to the host name of the `datahub-gms` container within the Docker network.
4949

50-
## Checking out Data Hub UI
50+
## Checking out DataHub UI
5151
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
5252
```
5353
http://localhost:9001

docker/gms/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Data Hub Generalized Metadata Store (GMS) Docker Image
1+
# DataHub Generalized Metadata Store (GMS) Docker Image
22
[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-gms)](https://cloud.docker.com/repository/docker/keremsahin/datahub-gms/)
33

4-
Refer to [Data Hub GMS Service](../../gms) to have a quick understanding of the architecture and
5-
responsibility of this service for the Data Hub.
4+
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
66

77
## Build
88
```
@@ -28,7 +28,7 @@ ports:
2828
```
2929

3030
#### Docker Network
31-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
31+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
3232
If you change this, you will need to change this for all other Docker containers as well.
3333
```
3434
networks:

docker/ingestion/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Data Hub MetadataChangeEvent (MCE) Ingestion Docker Image
1+
# DataHub MetadataChangeEvent (MCE) Ingestion Docker Image
22

3-
Refer to [Data Hub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and
4-
responsibility of this service for the Data Hub.
3+
Refer to [DataHub Metadata Ingestion](../../metadata-ingestion/mce-cli) to have a quick understanding of the architecture and
4+
responsibility of this service for the DataHub.
55

66
## Build
77
```
@@ -18,5 +18,5 @@ for the container otherwise it will build the image from local repository and th
1818

1919
### Container configuration
2020

21-
#### Kafka and Data Hub GMS Containers
21+
#### Kafka and DataHub GMS Containers
2222
Before starting `ingestion` container, `datahub-gms`, `kafka` and `datahub-mce-consumer` containers should already be up and running.

docker/kafka/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Kafka, Zookeeper and Schema Registry
22

3-
Data Hub uses Kafka as the pub-sub message queue in the backend.
3+
DataHub uses Kafka as the pub-sub message queue in the backend.
44
[Official Confluent Kafka Docker images](https://hub.docker.com/u/confluentinc) found in Docker Hub is used without
55
any modification.
66

@@ -29,7 +29,7 @@ ports:
2929
```
3030

3131
### Docker Network
32-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
32+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
3333
If you change this, you will need to change this for all other Docker containers as well.
3434
```
3535
networks:

docker/mae-consumer/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Data Hub MetadataAuditEvent (MAE) Consumer Docker Image
1+
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
22
[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-mae-consumer)](https://cloud.docker.com/repository/docker/keremsahin/datahub-mae-consumer/)
33

4-
Refer to [Data Hub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
5-
responsibility of this service for the Data Hub.
4+
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
66

77
## Build
88
```
@@ -20,7 +20,7 @@ for the container otherwise it will download the `latest` image from Docker Hub
2020
### Container configuration
2121

2222
#### Docker Network
23-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
23+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
2424
If you change this, you will need to change this for all other Docker containers as well.
2525
```
2626
networks:

docker/mce-consumer/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# Data Hub MetadataChangeEvent (MCE) Consumer Docker Image
1+
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
22
[![Docker Cloud Build Status](https://img.shields.io/docker/cloud/build/keremsahin/datahub-mce-consumer)](https://cloud.docker.com/repository/docker/keremsahin/datahub-mce-consumer/)
33

4-
Refer to [Data Hub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
5-
responsibility of this service for the Data Hub.
4+
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
66

77
## Build
88
```
@@ -20,15 +20,15 @@ for the container otherwise it will download the `latest` image from Docker Hub
2020
### Container configuration
2121

2222
#### Docker Network
23-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
23+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
2424
If you change this, you will need to change this for all other Docker containers as well.
2525
```
2626
networks:
2727
default:
2828
name: datahub_network
2929
```
3030

31-
#### Kafka and Data Hub GMS Containers
31+
#### Kafka and DataHub GMS Containers
3232
Before starting `datahub-mce-consumer` container, `datahub-gms` and `kafka` containers should already be up and running.
3333
These connections are configured via environment variables in `docker-compose.yml`:
3434
```

docker/mysql/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MySQL
22

3-
Data Hub GMS uses MySQL as the storage infrastructure.
3+
DataHub GMS uses MySQL as the storage infrastructure.
44
[Official MySQL Docker image](https://hub.docker.com/_/mysql) found in Docker Hub is used without
55
any modification.
66

@@ -11,7 +11,7 @@ cd docker/mysql && docker-compose pull && docker-compose up
1111
```
1212

1313
An initialization script [init.sql](init.sql) is provided to container. This script initializes `metadata-aspect` table
14-
which is basically the Key-Value store of the Data Hub GMS.
14+
which is basically the Key-Value store of the DataHub GMS.
1515

1616
To connect to MySQL container, you can type below command:
1717
```
@@ -29,7 +29,7 @@ ports:
2929
```
3030

3131
### Docker Network
32-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
32+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
3333
If you change this, you will need to change this for all other Docker containers as well.
3434
```
3535
networks:

docker/neo4j/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Neo4j
22

3-
Data Hub uses Neo4j as graph db in the backend to serve graph queries.
3+
DataHub uses Neo4j as graph db in the backend to serve graph queries.
44
[Official Neo4j image](https://hub.docker.com/_/neo4j) found in Docker Hub is used without
55
any modification.
66

@@ -22,7 +22,7 @@ ports:
2222
```
2323

2424
### Docker Network
25-
All Docker containers for Data Hub are supposed to be on the same Docker network which is `datahub_network`.
25+
All Docker containers for DataHub are supposed to be on the same Docker network which is `datahub_network`.
2626
If you change this, you will need to change it for all other Docker containers as well.
2727
```
2828
networks:

docker/quickstart/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
# Data Hub Quickstart
1+
# DataHub Quickstart
22
To start all Docker containers at once, please run below command:
33
```
44
cd docker/quickstart && docker-compose pull && docker-compose up --build
55
```
6-
At this point, all containers are ready and Data Hub can be considered up and running. Check specific containers guide
6+
At this point, all containers are ready and DataHub can be considered up and running. Check specific containers guide
77
for details:
88
* [Elasticsearch & Kibana](../elasticsearch)
9-
* [Data Hub Frontend](../frontend)
10-
* [Data Hub GMS](../gms)
9+
* [DataHub Frontend](../frontend)
10+
* [DataHub GMS](../gms)
1111
* [Kafka, Schema Registry & Zookeeper](../kafka)
12-
* [Data Hub MAE Consumer](../mae-consumer)
13-
* [Data Hub MCE Consumer](../mce-consumer)
12+
* [DataHub MAE Consumer](../mae-consumer)
13+
* [DataHub MCE Consumer](../mce-consumer)
1414
* [MySQL](../mysql)
1515

16-
From this point on, if you want to be able to sign in to Data Hub and see some sample data, please see
17-
[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping Data Hub`.
16+
From this point on, if you want to be able to sign in to DataHub and see some sample data, please see
17+
[Metadata Ingestion Guide](../../metadata-ingestion) for `bootstrapping DataHub`.
1818

1919
## Debugging Containers
2020
If you want to debug containers, you can check container logs:

docs/architecture/architecture.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# DataHub Architecture
2+
![datahub-architecture](../imgs/datahub-architecture.png)
3+
4+
## Metadata Serving
5+
Refer to [metadata-serving](metadata-serving.md).
6+
7+
## Metadata Ingestion
8+
Refer to [metadata-ingestion](metadata-ingestion.md).
9+
10+
## What is Generalized Metadata Architecture (GMA)?
11+
Refer to [GMA](../what/gma.md).

docs/architecture/metadata-ingestion.md

Whitespace-only changes.

docs/architecture/metadata-serving.md

Whitespace-only changes.

docs/how/entity-onboarding.md

Whitespace-only changes.

docs/how/graph-onboarding.md

Whitespace-only changes.

docs/how/metadata-modelling.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# How to model metadata for GMA?
2+
GMA uses [rest.li](https://rest.li), which is LinkedIn's open source REST framework.
3+
All metadata in GMA needs to be modelled using [Pegasus schema (PDSC)](https://linkedin.github.io/rest.li/DATA-Data-Schema-and-Templates) which is the data schema for [rest.li](https://rest.li).
4+
5+
Conceptually we’re modelling metadata as a hybrid graph of nodes ([entities](../what/entity.md)) and edges ([relationships](../what/relationship.md)), with additional documents ([metadata aspects](../what/aspect.md)) attached to each node.
6+
Below is an an example graph consisting of 3 types of entities (User, Group, Dataset), 3 types of relationships (OwnedBy, HasAdmin, HasMember), and 3 types of metadata aspects (Ownership, Profile, and Membership).
7+
8+
![metadata-modeling](../imgs/metadata-modeling.png)

docs/how/search-onboarding.md

Whitespace-only changes.

docs/imgs/datahub-architecture.png

42.8 KB
Loading
File renamed without changes.

docs/imgs/metadata-modeling.png

210 KB
Loading

0 commit comments

Comments
 (0)