Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(docker): make docker files easier to use during development. #1777

Merged
merged 9 commits into from
Aug 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/docker-frontend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/frontend/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/docker-gms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/gms/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/docker-mae-consumer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/mae-consumer/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/docker-mce-consumer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: docker/build-push-action@v1
env:
DOCKER_BUILDKIT: 1
with:
dockerfile: ./docker/mce-consumer/Dockerfile
username: ${{ secrets.DOCKER_USERNAME }}
Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc
.java-version

# Python
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.mypy_cache/
Expand Down
55 changes: 42 additions & 13 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,56 @@
# Docker Images

## Prerequisites
You need to install [docker](https://docs.docker.com/install/) and
[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with
Docker Desktop).

Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap
area.

## Quickstart

The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images
which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository.

You can easily download and run all these images and their dependencies with our
[quick start guide](../docs/quickstart.md).

DataHub Docker Images:

* [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/)
* [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/)
* [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/)
* [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/)

Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are
generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends
on below Docker images to be able to run:
Dependencies:
* [**Kafka and Schema Registry**](kafka)
* [**Elasticsearch**](elasticsearch)
* [**Elasticsearch**](elasticsearch-setup)
* [**MySQL**](mysql)

Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script.
The pipeline depends on all the above images composing up.
* [**Ingestion**](ingestion)
### Ingesting demo data.

## Prerequisites
You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md).

## Quickstart
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check
[Quickstart Guide](quickstart).
## Using Docker Images During Development

See [Using Docker Images During Development](../docs/docker/development.md).

## Building And Deploying Docker Images

We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a
successful release on Github will automatically publish the images.

### Building images

To build the full images (that we are going to publish), you need to run the following:

```
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
```

This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to
something unique.

This is not our recommended development flow and most developers should be following the
[Using Docker Images During Development](#using-docker-images-during-development) guide.
6 changes: 6 additions & 0 deletions docker/broker/env/docker.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
KAFKA_BROKER_ID=1
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
File renamed without changes.
16 changes: 16 additions & 0 deletions docker/datahub-frontend/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# DataHub Frontend Docker Image

[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)

Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.

## Checking out DataHub UI

After starting your Docker container, you can connect to it by typing below into your favorite web browser:

```
http://localhost:9001
```

You can sign in with `datahub` as username and password.
5 changes: 5 additions & 0 deletions docker/datahub-frontend/env/docker.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
DATAHUB_GMS_HOST=datahub-gms
DATAHUB_GMS_PORT=8080
DATAHUB_SECRET=YouKnowNothing
DATAHUB_APP_VERSION=1.0
DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB
28 changes: 28 additions & 0 deletions docker/datahub-gms/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Defining environment
ARG APP_ENV=prod

FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM openjdk:8 as prod-build
COPY . /datahub-src
RUN cd /datahub-src && ./gradlew :gms:war:build
RUN cp /datahub-src/gms/war/build/libs/war.war /war.war

FROM base as prod-install
COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war
COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh
RUN chmod +x /datahub/datahub-gms/scripts/start.sh

FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134

FROM ${APP_ENV}-install as final

EXPOSE 8080

CMD /datahub/datahub-gms/scripts/start.sh
22 changes: 22 additions & 0 deletions docker/datahub-gms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# DataHub Generalized Metadata Store (GMS) Docker Image
[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)

Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.

## Other Database Platforms

While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
[database platforms](https://ebean.io/docs/database/) supported by Ebean.

For example, you can run the following command to start a GMS that connects to a PostgreSQL backend.

```
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up)
```

or a MariaDB backend

```
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up)
```
13 changes: 13 additions & 0 deletions docker/datahub-gms/env/docker.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=mysql:3306
EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub
13 changes: 13 additions & 0 deletions docker/datahub-gms/env/docker.mariadb.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=mariadb:3306
EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub
13 changes: 13 additions & 0 deletions docker/datahub-gms/env/docker.postgres.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
EBEAN_DATASOURCE_USERNAME=datahub
EBEAN_DATASOURCE_PASSWORD=datahub
EBEAN_DATASOURCE_HOST=postgres:5432
EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub
2 changes: 1 addition & 1 deletion docker/gms/start.sh → docker/datahub-gms/start.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ dockerize \
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
-wait http://$NEO4J_HOST \
-timeout 240s \
java -jar jetty-runner.jar gms.war
java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war
27 changes: 27 additions & 0 deletions docker/datahub-mae-consumer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Defining environment
ARG APP_ENV=prod

FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM openjdk:8 as prod-build
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build
RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar

FROM base as prod-install
COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/
COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/
RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh

FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134

FROM ${APP_ENV}-install as final

EXPOSE 9090

CMD /datahub/datahub-mae-consumer/scripts/start.sh
5 changes: 5 additions & 0 deletions docker/datahub-mae-consumer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)

Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
8 changes: 8 additions & 0 deletions docker/datahub-mae-consumer/env/docker.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
NEO4J_HOST=neo4j:7474
NEO4J_URI=bolt://neo4j
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=datahub
2 changes: 1 addition & 1 deletion docker/mae-consumer/start.sh → docker/datahub-mae-consumer/start.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ dockerize \
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
-wait http://$NEO4J_HOST \
-timeout 240s \
java -jar mae-consumer-job.jar
java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar
27 changes: 27 additions & 0 deletions docker/datahub-mce-consumer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Defining environment
ARG APP_ENV=prod

FROM openjdk:8-jre-alpine as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM openjdk:8 as prod-build
COPY . datahub-src
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build
RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar

FROM base as prod-install
COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/
COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/
RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh

FROM base as dev-install
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
# See this excellent thread https://github.com/docker/cli/issues/1134

FROM ${APP_ENV}-install as final

EXPOSE 9090

CMD /datahub/datahub-mce-consumer/scripts/start.sh
5 changes: 5 additions & 0 deletions docker/datahub-mce-consumer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)

Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
responsibility of this service for the DataHub.
4 changes: 4 additions & 0 deletions docker/datahub-mce-consumer/env/docker.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
KAFKA_BOOTSTRAP_SERVER=broker:29092
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
GMS_HOST=datahub-gms
GMS_PORT=8080
2 changes: 1 addition & 1 deletion docker/mce-consumer/start.sh → docker/datahub-mce-consumer/start.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@
dockerize \
-wait tcp://$KAFKA_BOOTSTRAP_SERVER \
-timeout 240s \
java -jar mce-consumer-job.jar
java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar
17 changes: 17 additions & 0 deletions docker/dev.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash

# Launches dev instances of DataHub images. See documentation for more details.
# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS.
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && \
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \
-f docker-compose.yml \
-f docker-compose.override.yml \
-f docker-compose.dev.yml \
pull \
&& \
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \
-f docker-compose.yml \
-f docker-compose.override.yml \
-f docker-compose.dev.yml \
up
Loading