Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Kubernetes) Add metadata-ingestion as a Helm component #2236

Merged
merged 3 commits into from
Mar 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions contrib/kubernetes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,11 @@ The following table lists the configuration parameters and its default values

| Repository | Name | Version |
|------------|------|---------|
| file://./charts/datahub-frontend | datahub-frontend | 0.2.0 |
| file://./charts/datahub-gms | datahub-gms | 0.2.0 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.0 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.0 |
| file://./charts/datahub-frontend | datahub-frontend | 0.2.1 |
| file://./charts/datahub-gms | datahub-gms | 0.2.1 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.1 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.1 |
| file://./charts/datahub-ingestion-cron | datahub-ingestion-cron | 0.2.1 |

## Install DataHub
Navigate to the current directory and run the below command. Update the `datahub/values.yaml` file with valid hostname/IP address configuration for elasticsearch, neo4j, schema-registry, broker & mysql.
Expand Down
6 changes: 5 additions & 1 deletion contrib/kubernetes/datahub/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,8 @@ dependencies:
- name: datahub-mce-consumer
version: 0.2.1
repository: file://./charts/datahub-mce-consumer
condition: datahub-mce-consumer.enabled
condition: datahub-mce-consumer.enabled
- name: datahub-ingestion-cron
version: 0.2.1
repository: file://./charts/datahub-ingestion-cron
condition: datahub-ingestion-cron.enabled
2 changes: 2 additions & 0 deletions contrib/kubernetes/datahub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Current chart version is `0.1.1`
| file://./charts/datahub-gms | datahub-gms | 0.2.1 |
| file://./charts/datahub-mae-consumer | datahub-mae-consumer | 0.2.1 |
| file://./charts/datahub-mce-consumer | datahub-mce-consumer | 0.2.1 |
| file://./charts/datahub-ingestion-cron | datahub-ingestion-cron | 0.2.1 |

#### Chart Values

Expand All @@ -29,6 +30,7 @@ Current chart version is `0.1.1`
| datahub-mce-consumer.enabled | bool | `true` | |
| datahub-mce-consumer.image.repository | string | `"linkedin/datahub-mce-consumer"` | |
| datahub-mce-consumer.image.tag | string | `"latest"` | |
| datahub-ingestion-cron.enabled | bool | `false` | |
| global.datahub.appVersion | string | `"1.0"` | |
| global.datahub.gms.port | string | `"8080"` | |
| global.elasticsearch.host | string | `"elasticsearch"` | |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: v2
name: datahub-ingestion-cron
description: A Helm chart for Kubernetes

# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 0.2.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 0.3.1
25 changes: 25 additions & 0 deletions contrib/kubernetes/datahub/charts/datahub-ingestion-cron/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
datahub-ingestion-cron
================
A Helm chart for datahub's metadata-ingestion framework with kerberos authentication.

## Chart Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| image.pullPolicy | string | `"Always"` | Image pull policy |
| image.repository | string | `"linkedin/datahub-ingestion"` | DataHub Ingestion image repository |
| image.tag | string | `"latest"` | DataHub Ingestion image tag |
| imagePullSecrets | array | `[]` (does not add image pull secrets to deployed pods) | Docker registry secret names as an array |
| labels | string | `{}` | Metadata labels to be added to each crawling cron job |
| crons | type | `[]` | A list of crawling parameters per different technology being crawler |
| crons.name | string | `crawler` | Name of the crawler container |
| crons.schedule | string | `""0 0 * * *"` | Cron expression (daily at midnight) for crawler jobs |
| crons.crawlerConfigPath | string | N/A | Path to metadata configuration file. This must explicitly defined as a mount and is **required**. |
| crons.hostAliases | array | `[]` | host aliases |
| crons.env | object | `{}` | Environment variables to add to the cronjob container |
| crons.envFromSecrets | object | `{}` | Environment variables from secrets to the cronjob container |
| crons.envFromSecrets*.secret | string | | secretKeyRef.name used for environment variable |
| crons.envFromSecrets*.key | string | | secretKeyRef.key used for environment variable |
| crons.extraVolumes | array | `[]` | Additional volumes to add to the pods |
| crons.extraVolumeMounts | array | `[]` | Additional volume mounts to add to the pods |
| crons.extraInitContainers | object | `{}` | Init containers to add to the cronjob container |
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "datahub-ingestion-cron.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "datahub-ingestion-cron.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "datahub-ingestion-cron.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}

{{/*
Common labels
*/}}
{{- define "datahub-ingestion-cron.labels" -}}
helm.sh/chart: {{ include "datahub-ingestion-cron.chart" . }}
{{ include "datahub-ingestion-cron.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}

{{/*
Selector labels
*/}}
{{- define "datahub-ingestion-cron.selectorLabels" -}}
app.kubernetes.io/name: {{ include "datahub-ingestion-cron.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}

{{/*
Create the name of the service account to use
*/}}
{{- define "datahub-ingestion-cron.serviceAccountName" -}}
{{- if .Values.serviceAccount.create -}}
{{ default (include "datahub-ingestion-cron.fullname" .) .Values.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.serviceAccount.name }}
{{- end -}}
{{- end -}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
{{- $baseName := include "datahub-ingestion-cron.fullname" .}}
{{- $labels := include "datahub-ingestion-cron.labels" .}}
{{- range $job, $val := .Values.crons }}
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: "{{ $baseName }}-{{ .name }}"
labels: {{- $labels | nindent 4 }}
spec:
schedule: {{ default "0 0 * * *" .schedule | quote}}
jobTemplate:
spec:
template:
spec:
{{- with $.Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 12 }}
{{- end }}
{{- if .extraInitContainers }}
initContainers:
{{- toYaml .extraInitContainers | nindent 12 }}
{{- end }}
{{- if .hostAliases }}
hostAliases: {{- include "common.tplvalues.render" (dict "value" .hostAliases "context" $) | nindent 10 }}
{{- end }}
containers:
- name: {{ default "crawler" .name }}
image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag }}"
imagePullPolicy: {{ $.Values.image.pullPolicy }}
{{- if .extraVolumeMounts }}
volumeMounts:
{{- toYaml .extraVolumeMounts | nindent 14 }}
{{- end }}
command:
- /bin/sh
- -c
- datahub ingest -c {{ required "Path to configuration file is required" .crawlerConfigPath }}
env:
{{- if .env }}
{{- range $key,$value := .env }}
- name: {{ $key | quote}}
value: {{ $value | quote}}
{{- end }}
{{- end }}
{{- if .envFromSecrets }}
{{- range $key,$value := .envFromSecrets }}
- name: {{ $key | quote}}
valueFrom:
secretKeyRef:
name: {{ $value.secret | quote}}
key: {{ $value.key | quote}}
{{- end }}
{{- end }}
restartPolicy: OnFailure
{{- if .extraVolumes }}
volumes:
{{- toYaml .extraVolumes | nindent 12 }}
{{- end }}
---
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Default values for datahub-ingestion-cron.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

image:
repository: linkedin/datahub-ingestion
tag: latest
pullPolicy: Always

imagePullSecrets: []

crons: []
#### Example data
## Metadata ingestion name
##
#name: "crawler"

## Daily at midnight (we may want to offset this to not conflict with other processes)
#schedule: "0 0 * * *"

## Deployment pod host aliases
## https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/
##
#hostAliases: []

## Environment variables.
#env: {}

## Environment variables from Secret resources.
#envFromSecrets: {}

## Additional primary volume mounts
##
#extraVolumeMounts: []

## Additional primary volumes
##
#extraVolumes: []

## Add your own init container or uncomment and modify the given example.
##
#extraInitContainers: {}
6 changes: 6 additions & 0 deletions contrib/kubernetes/datahub/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ datahub-mce-consumer:
repository: linkedin/datahub-mce-consumer
tag: "latest"

datahub-ingestion-cron:
enabled: false
image:
repository: linkedin/datahub-ingestion
tag: "latest"

global:
elasticsearch:
host: "elasticsearch"
Expand Down