Skip to content

Commit

Permalink
feat(k8s): Move helm charts out of contrib (#2440)
Browse files Browse the repository at this point in the history
  • Loading branch information
Dexter Lee authored Apr 23, 2021
1 parent 851e00b commit ae4def2
Show file tree
Hide file tree
Showing 68 changed files with 442 additions and 155 deletions.
70 changes: 0 additions & 70 deletions contrib/kubernetes/README.md

This file was deleted.

65 changes: 0 additions & 65 deletions contrib/kubernetes/datahub/README.md

This file was deleted.

134 changes: 134 additions & 0 deletions datahub-kubernetes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
title: "Deploying with Kubernetes"
hide_title: true
---

# Deploying Datahub with Kubernetes

## Introduction
[This directory](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes) provides
the Kubernetes [Helm](https://helm.sh/) charts for deploying [Datahub](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/datahub) and it's [dependencies](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/prerequisites)
(Elasticsearch, Neo4j, MySQL, and Kafka) on a Kubernetes cluster.

## Setup
1. Set up a kubernetes cluster
- In a cloud platform of choice like [Amazon EKS](https://aws.amazon.com/eks),
[Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine),
and [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) OR
- In local environment using [Minikube](https://minikube.sigs.k8s.io/docs/).
Note, more than 7GB of RAM is required to run Datahub and it's dependencies
2. Install the following tools:
- [kubectl](https://kubernetes.io/docs/tasks/tools/) to manage kubernetes resources
- [helm](https://helm.sh/docs/intro/install/) to deploy the resources based on helm charts.
Note, we only support Helm 3.

## Components
Datahub consists of 4 main components: [GMS](https://datahubproject.io/docs/gms),
[MAE Consumer](https://datahubproject.io/docs/metadata-jobs/mae-consumer-job),
[MCE Consumer](https://datahubproject.io/docs/metadata-jobs/mce-consumer-job), and
[Frontend](https://datahubproject.io/docs/datahub-frontend). Kubernetes deployment
for each of the components are defined as subcharts under the main
[Datahub](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/datahub)
helm chart.

The main components are powered by 4 external dependencies:
- Kafka
- Local DB (MySQL, Postgres, MariaDB)
- Search Index (Elasticsearch)
- Graph Index (Supports only Neo4j)

The dependencies must be deployed before deploying Datahub. We created a separate
[chart](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/prerequisites)
for deploying the dependencies with example configuration. They could also be deployed
separately on-prem or leveraged as managed services.

## Quickstart
Assuming kubectl context points to the correct kubernetes cluster, first create kubernetes secrets that contain MySQL and Neo4j passwords.

```(shell)
kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=datahub
kubectl create secret generic neo4j-secrets --from-literal=neo4j-password=datahub
```

The above commands sets the passwords to "datahub" as an example. Change to any password of choice.

Second, deploy the dependencies by running the following

```(shell)
helm install prerequisites prerequisites/
```

Note, after changing the configurations in the values.yaml file, you can run

```(shell)
helm upgrade prerequisites prerequisites/
```

To just redeploy the dependencies impacted by the change.

Run `kubectl get pods` to check whether all the pods for the dependencies are running.
You should get a result similar to below.

```
NAME READY STATUS RESTARTS AGE
elasticsearch-master-0 1/1 Running 0 62m
elasticsearch-master-1 1/1 Running 0 62m
elasticsearch-master-2 1/1 Running 0 62m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv 2/2 Running 1 63m
prerequisites-kafka-0 1/1 Running 2 62m
prerequisites-mysql-0 1/1 Running 1 62m
prerequisites-neo4j-community-0 1/1 Running 0 52m
prerequisites-zookeeper-0 1/1 Running 0 62m
```

deploy Datahub by running the following

```(shell)
helm install datahub datahub/ --values datahub/quickstart-values.yaml
```

Values in [quickstart-values.yaml](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/datahub/quickstart-values.yaml)
have been preset to point to the dependencies deployed using the [prerequisites](https://github.com/linkedin/datahub/tree/master/datahub-kubernetes/prerequisites)
chart with release name "prerequisites". If you deployed the helm chart using a different release name, update the quickstart-values.yaml file accordingly before installing.

Run `kubectl get pods` to check whether all the datahub pods are running. You should get a result similar to below.

```
NAME READY STATUS RESTARTS AGE
datahub-datahub-frontend-84c58df9f7-5bgwx 1/1 Running 0 4m2s
datahub-datahub-gms-58b676f77c-c6pfx 1/1 Running 0 4m2s
datahub-datahub-mae-consumer-7b98bf65d-tjbwx 1/1 Running 0 4m3s
datahub-datahub-mce-consumer-8c57d8587-vjv9m 1/1 Running 0 4m2s
datahub-elasticsearch-setup-job-8dz6b 0/1 Completed 0 4m50s
datahub-kafka-setup-job-6blcj 0/1 Completed 0 4m40s
datahub-mysql-setup-job-b57kc 0/1 Completed 0 4m7s
elasticsearch-master-0 1/1 Running 0 97m
elasticsearch-master-1 1/1 Running 0 97m
elasticsearch-master-2 1/1 Running 0 97m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv 2/2 Running 1 99m
prerequisites-kafka-0 1/1 Running 2 97m
prerequisites-mysql-0 1/1 Running 1 97m
prerequisites-neo4j-community-0 1/1 Running 0 88m
prerequisites-zookeeper-0 1/1 Running 0 97m
```

You can run the following to expose the frontend locally. Note, you can find the pod name using the command above.
In this case, the datahub-frontend pod name was `datahub-datahub-frontend-84c58df9f7-5bgwx`.

```(shell)
kubectl port-forward <datahub-frontend pod name> 9002:9002
```

You should be able to access the frontend via http://localhost:9002.

Once you confirm that the pods are running well, you can set up ingress for datahub-frontend
to expose the 9002 port to the public.
## Other useful commands

| Command | Description |
|-----|------|
| helm uninstall datahub | Remove DataHub |
| helm ls | List of Helm charts |
| helm history | Fetch a release history |


File renamed without changes.
File renamed without changes.
67 changes: 67 additions & 0 deletions datahub-kubernetes/datahub/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
datahub
=======
A Helm chart for LinkedIn DataHub

Current chart version is `0.1.2`

## Install DataHub
Navigate to the current directory and run the below command. Update the `datahub/values.yaml` file with valid hostname/IP address configuration for elasticsearch, neo4j, schema-registry, broker & mysql.

``
helm install datahub datahub/
``

## Chart Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| datahub-frontend.enabled | bool | `true` | Enable Datahub Front-end |
| datahub-frontend.image.repository | string | `"linkedin/datahub-frontend-react"` | Image repository for datahub-frontend |
| datahub-frontend.image.tag | string | `"latest"` | Image tag for datahub-frontend |
| datahub-gms.enabled | bool | `true` | Enable GMS |
| datahub-gms.image.repository | string | `"linkedin/datahub-gms"` | Image repository for datahub-gms |
| datahub-gms.image.tag | string | `"latest"` | Image tag for datahub-gms |
| datahub-mae-consumer.enabled | bool | `true` | Enable MAE Consumer |
| datahub-mae-consumer.image.repository | string | `"linkedin/datahub-mae-consumer"` | Image repository for datahub-mae-consumer |
| datahub-mae-consumer.image.tag | string | `"latest"` | Image tag for datahub-mae-consumer |
| datahub-mce-consumer.enabled | bool | `true` | Enable MCE Consumer |
| datahub-mce-consumer.image.repository | string | `"linkedin/datahub-mce-consumer"` | Image repository for datahub-mce-consumer |
| datahub-mce-consumer.image.tag | string | `"latest"` | Image tag for datahub-mce-consumer |
| datahub-ingestion-cron.enabled | bool | `false` | Enable cronjob for periodic ingestion |
| elasticsearchSetupJob.enabled | bool | `true` | Enable setup job for elasicsearch |
| elasticsearchSetupJob.image.repository | string | `"linkedin/datahub-elasticsearch-setup"` | Image repository for elasticsearchSetupJob |
| elasticsearchSetupJob.image.tag | string | `"latest"` | Image repository for elasticsearchSetupJob |
| kafkaSetupJob.enabled | bool | `true` | Enable setup job for kafka |
| kafkaSetupJob.image.repository | string | `"linkedin/datahub-kafka-setup"` | Image repository for kafkaSetupJob |
| kafkaSetupJob.image.tag | string | `"latest"` | Image repository for kafkaSetupJob |
| mysqlSetupJob.enabled | bool | `false` | Enable setup job for mysql |
| mysqlSetupJob.image.repository | string | `""` | Image repository for mysqlSetupJob |
| mysqlSetupJob.image.tag | string | `""` | Image repository for mysqlSetupJob |
| global.datahub.appVersion | string | `"1.0"` | App version for annotation |
| global.datahub.gms.port | string | `"8080"` | Port of GMS service |
| global.elasticsearch.host | string | `"elasticsearch"` | Elasticsearch host name (endpoint) |
| global.elasticsearch.port | string | `"9200"` | Elasticsearch port |
| global.kafka.bootstrap.server | string | `"broker:9092"` | Kafka bootstrap servers (with port) |
| global.kafka.zookeeper.server | string | `"zookeeper:2181"` | Kafka zookeeper servers (with port) |
| global.kafka.schemaregistry.url | string | `"http://schema-registry:8081"` | URL to kafka schema registry |
| global.neo4j.host | string | `"neo4j:7474"` | Neo4j host address (with port) |
| global.neo4j.uri | string | `"bolt://neo4j"` | Neo4j URI |
| global.neo4j.username | string | `"neo4j"` | Neo4j user name |
| global.neo4j.password.secretRef | string | `"neo4j-secrets"` | Secret that contains the Neo4j password |
| global.neo4j.password.secretKey | string | `"neo4j-password"` | Secret key that contains the Neo4j password |
| global.sql.datasource.driver | string | `"com.mysql.jdbc.Driver"` | Driver for the SQL database |
| global.sql.datasource.host | string | `"mysql:3306"` | SQL database host (with port) |
| global.sql.datasource.hostForMysqlClient | string | `"mysql"` | SQL database host (without port) |
| global.sql.datasource.url | string | `"jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false\u0026useSSL=true"` | URL to access SQL database |
| global.sql.datasource.username | string | `"datahub"` | SQL user name |
| global.sql.datasource.password.secretRef | string | `"mysql-secrets"` | Secret that contains the MySQL password |
| global.sql.datasource.password.secretKey | string | `"mysql-password"` | Secret key that contains the MySQL password |

## Optional Chart Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.credentialsAndCertsSecrets.name | string | `""` | Name of the secret that holds SSL certificates (keystores, truststores) |
| global.credentialsAndCertsSecrets.path | string | `"/mnt/certs"` | Path to mount the SSL certificates |
| global.credentialsAndCertsSecrets.secureEnv | map | `{}` | Map of SSL config name and the corresponding value in the secret |
| global.springKafkaConfigurationOverrides | map | `{}` | Map of configuration overrides for accessing kafka |
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Current chart version is `0.2.0`
| fullnameOverride | string | `"datahub-frontend"` | |
| global.datahub.gms.port | string | `"8080"` | |
| image.pullPolicy | string | `"IfNotPresent"` | |
| image.repository | string | `"linkedin/datahub-frontend"` | |
| image.repository | string | `"linkedin/datahub-frontend-react"` | |
| image.tag | string | `"latest"` | |
| imagePullSecrets | list | `[]` | |
| ingress.annotations | object | `{}` | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
replicaCount: 1

image:
repository: linkedin/datahub-frontend
repository: linkedin/datahub-frontend-react
tag: "latest"
pullPolicy: Always

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Current chart version is `0.2.0`
| global.hostAliases[0].hostnames[2] | string | `"elasticsearch"` | |
| global.hostAliases[0].hostnames[3] | string | `"neo4j"` | |
| global.hostAliases[0].ip | string | `"192.168.0.104"` | |
| global.kafka.bootstrap.server | string | `"broker:29092"` | |
| global.kafka.bootstrap.server | string | `"broker:9092"` | |
| global.kafka.schemaregistry.url | string | `"http://schema-registry:8081"` | |
| global.neo4j.host | string | `"neo4j:7474"` | |
| global.neo4j.uri | string | `"bolt://neo4j"` | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ global:

kafka:
bootstrap:
server: "broker:29092"
server: "broker:9092"
schemaregistry:
url: "http://schema-registry:8081"

Expand Down
Loading

0 comments on commit ae4def2

Please sign in to comment.