Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(docker): make docker files easier to use during development. #1777

Merged
merged 9 commits into from
Aug 6, 2020
Merged

refactor(docker): make docker files easier to use during development. #1777

merged 9 commits into from
Aug 6, 2020

Conversation

jplaisted
Copy link
Contributor

@jplaisted jplaisted commented Aug 5, 2020

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:

  • Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
  • Remove redundant README files that provided little information.
  • Rename docker/ to match the service name in the docker-compose file for clarity.
  • Move environment variables to .env files. We only provide dev / the default environment for quickstart.
  • Add debug docker files to build minimal images with the idea that built files will be mounted instead.
  • Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
  • Added docs/docker documentation for this.

Testing:

  • Confirmed that running docker/quickstart.sh will just download images from the repo, assuming you don't have them.
  • Confirmed that docker-compose up --build will build all images from scratch, which takes a long time.
  • Confirmed that docker/dev.sh will build images with the :debug tag, so that they cannot be pulled. These can also be launched even if you have the :latest image, however it may require that you remove the container (even if stopped) with docker container prune.

TODO to get the dev stack working with frontend.

#1760

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug docker files to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.

Testing:
- Confirmed that running docker/quickstart.sh will just download images from the repo, assuming you don't have them.
- Confirmed that `docker-compose up --build` will build all images from scratch, which takes a long time.
- Confirmed that docker/dev.sh will build images with the :debug tag, so that they cannot be pulled. These can also be launched even if you have the :latest image, however it may require that you remove the container (even if stopped) with `docker container prune`.

TODO to get the dev stack working with frontend.
@mars-lan
Copy link
Contributor

mars-lan commented Aug 5, 2020

Confirmed that docker-compose up --build will build all images from scratch, which takes a long time.

I don't see a reason to support building quickstart images locally. It only causes more confusions when you have locally built :latest images.

docker/quickstart.sh Outdated Show resolved Hide resolved
docker/docker-compose.mariadb.yml Outdated Show resolved Hide resolved
docker/datahub-gms/debug/Dockerfile Outdated Show resolved Hide resolved
John Plaisted added 2 commits August 5, 2020 10:20
- Move db compose files to their own dirs.
- Clarify some documentation.
docker/README.md Outdated Show resolved Hide resolved
docker/broker/env/dev.env Outdated Show resolved Hide resolved
docker/datahub-gms/Dockerfile Outdated Show resolved Hide resolved
docker/datahub-mae-consumer/debug/Dockerfile Outdated Show resolved Hide resolved
docker/docker-compose.dev.yml Show resolved Hide resolved
@mars-lan mars-lan changed the title Make docker files easier to use during development. refactor(docker): make docker files easier to use during development. Aug 5, 2020
@mars-lan
Copy link
Contributor

mars-lan commented Aug 6, 2020

Confirmed that docker-compose up --build will build all images from scratch, which takes a long time.

I don't see a reason to support building quickstart images locally. It only causes more confusions when you have locally built :latest images.

Actually I take that back. You do need the ability to rebuilt the latest images when debugging/developing the docker-compose files.

@jplaisted
Copy link
Contributor Author

Confirmed that docker-compose up --build will build all images from scratch, which takes a long time.

I don't see a reason to support building quickstart images locally. It only causes more confusions when you have locally built :latest images.

Actually I take that back. You do need the ability to rebuilt the latest images when debugging/developing the docker-compose files.

Maybe we just ask people to manually rebuild in that case? I agree with your original comment :p

@mars-lan
Copy link
Contributor

mars-lan commented Aug 6, 2020

Confirmed that docker-compose up --build will build all images from scratch, which takes a long time.

I don't see a reason to support building quickstart images locally. It only causes more confusions when you have locally built :latest images.

Actually I take that back. You do need the ability to rebuilt the latest images when debugging/developing the docker-compose files.

Maybe we just ask people to manually rebuild in that case? I agree with your original comment :p

Without a nice docker-compose file, rebuilding using docker directly will be a PITA.

Copy link
Contributor

@mars-lan mars-lan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work. Please do a thorough e2e test before merging.

@jplaisted
Copy link
Contributor Author

Tested by cleaning all docker stuff locally.

docker image prune
docker container prune
docker image list | grep datahub | awk '{print $3}' | xargs docker rmi -f
docker volume list | awk '{print $2}' | xargs docker volume rm

Then I ran quickstart.sh and verified it worked (so the currently published images, with these new environment files). Then I ran dev.sh and verified it also worked. Finally, I ran

COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 DATAHUB_VERSION=locallatest docker-compose -p datahub up

to build our published images (with a distinct tag), and verified those images worked correctly.

Testing each app was done by just logging into DataHub's frontend.

Side note seems like our ingestion script is broken (specifically building the MCE avro files doesn't seem to do anything? or at least :metadata-models-ext:build doesn't do anything; I have to run the specific tasks... I'll make another issue).

@jplaisted
Copy link
Contributor Author

Oh that may have been me; I guess :metadata-events:build does nothing, which makes some sense. since it isn't a module?

@jplaisted
Copy link
Contributor Author

Also verified I can ingest data and the browse/search for it.

@jplaisted jplaisted merged commit b8e18b0 into datahub-project:master Aug 6, 2020
@mars-lan
Copy link
Contributor

mars-lan commented Aug 7, 2020

Please fix the broken GH Actions for building docker images: https://github.com/linkedin/datahub/actions

arunvasudevan pushed a commit to arunvasudevan/datahub that referenced this pull request Sep 10, 2020
…datahub-project#1777)

* Make docker files easier to use during development.

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.
mars-lan added a commit that referenced this pull request Sep 10, 2020
* ML Model Schema Initial Version for feedback

* Added Deprecation Model

* Remove lock files

* Committing yarn lock file

* Fix Review Comments

* Using Common VersionTag Entity

* PR Review Comments Round-2

* Updated all model and feature references to MLModel and MLFeature

* Addressing PR Comments (Round 3)

* Updating Hyperparameter to a Map type

* Update to Dataset

* Review comments based on RFC

* ML Model Schema Initial Version for feedback

* Added Deprecation Model

* Remove lock files

* Committing yarn lock file

* Fix Review Comments

* Using Common VersionTag Entity

* PR Review Comments Round-2

* Updated all model and feature references to MLModel and MLFeature

* Addressing PR Comments (Round 3)

* Updating Hyperparameter to a Map type

* Update to Dataset

* fix: modify the etl script dependency (#1726)

Co-authored-by: Cobolbaby <[email protected]>

* fix: correct the way to catch the exception (#1727)

* fix: modify the etl script dependency

* fix: Correct the way to catch the exception

* fix: Compatible with the following kafka cluster when the Kafka Topic message Key cannot be empty

* fix: Adjust the kafka message key; Improve the comment of field

* fix: Avro schema required for key

Co-authored-by: Cobolbaby <[email protected]>

* refactor(models): remove internal cluster model (#1733)

* refactor(models): remove internal cluster model

Remove internal model which is not used in open source

* build(deps): bump lodash from 4.17.15 to 4.17.19 in /datahub-web (#1738)

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.15 to 4.17.19.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](lodash/lodash@4.17.15...4.17.19)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update README.md

* Update README.md

* Update README.md

* Update the roadmap (#1740)

* Update the roadmap

- Make short term more like what we're doing this quarter
- Medium term is next quarter
- Long term is 2 or 3 quarters from now
- Visionary is even beyond that

Making this PR mostly to discuss the roadmap. I've moved a few items down to "unprioritized"; before merging this we should put these in a category. Mostly saving the state of what I've done so far.

* Update roadmap.md

Co-authored-by: Mars Lan <[email protected]>

* Update roadmap.md

* Update README.md

* doc: add a separate doc to keep track of the full list or links (#1744)

* Update README.md

* Create links.md

* Update README.md

* Update links.md

* Update README.md

* Update README.md

* Update features.md

* Update faq.md

* Update README.md

* Update README.md

* feat(gms): add postgres & mariadb supports to GMS (#1742)

* feat(gms): add postgres & mariadb supports to GMS

Also add corresponding docker-compose files

* Update README.md

* build(frontend): Drop unnecessary DB-related dependencies (#1741)

* refactor(frontend): Drop unnecessary DB-related dependencies

* Drop unused dependencies from top-level build script

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update links.md

* Update README.md

* Doc fixes

* Update roadmap.md

* Update faq.md

* Set theme jekyll-theme-cayman

* Create _config.yml

* Delete _config.yml

* Set theme jekyll-theme-cayman

* Update _config.yml

* Update _config.yml

* build: build GitHub Page from /docs directory (#1750)

- Move top-level MD files to /docs and symlink them back
- Update all absolute links to files in /docs to relative links

* Revert "build: build GitHub Page from /docs directory (#1750)" (#1751)

This reverts commit b0f56de.

* build: build GitHub Pages from /docs directory (#1752)

- Move non-README top-level MD files to /docs
- Update all absolute links to files in /docs to relative links
- Add a placeholder front page for GitHub Pages

* Update README.md

* Update README.md

* Update README.md

* feat(kafka-config): Add ability to configure other Kafka props (#1745)

* Integarte spring-kafka & spring-boot for security props

- Upgrade spring-kafka to 2.1.14
- Use KafkaListener and KafkaTemplates to enable KafkaAutoConfiguration
- Integrates spring-boot's KafkaProperties into spring-kafka's config

* Cleanup imports

* Add DataHub kafka env vars

* Remove kafka-streams dependency

* Add KafkaProperties to gms; Add docs

* Add to Adoption

* Remove KAFKA_BOOTSTRAP_SERVER default

Co-authored-by: jsotelo <[email protected]>
Co-authored-by: Kerem Sahin <[email protected]>

* Agenda for next town hall

* Update townhalls.md

* Update README.md

* Update README.md

* Add documentation around the DataHub RFC process. (#1754)

Other repos have similar RFC processes (though they seem to have a separate repo for their RFC docs).

This provides a more structured way for contributors to make siginficant design contributions.

#1692

* metadata-models 72.0.8 -> 80.0.0 (#1756)

* <refactor>[ingestions]: align the default kafka topics with PR #1756 (#1758)

* docs: add a sequence diagram and a description (#1757)

* add a sequence diagram and a description

* update descrpition based on feedback

* Update README.md

* Update README.md

Co-authored-by: Mars Lan <[email protected]>

* Update README.md

* Fix reflinks in PR template (#1764)

* Update kafka-config.md (#1763)

Fix name of spring-kafka property to pass SASL_JAAS config

* Update entity.md

* Update README.md

* Update faq.md

* Update townhalls.md

* Update README.md

* Update townhalls.md

* Update townhalls.md

* docs: move quickstart guide to a separate file under docs (#1765)

docs: move quickstart guide to a separate doc under docs directory

* Update slack.md

* Update README.md

* Update slack.md

* Update metadata-ingestion.md

* Add workflow to check build and tests on PRs + releases. (#1769)

PRs are setup to skip docs.

Also, only run docker actions on linkedin/datahub (i.e. disable on forks; makes forks nicer since you don't have failing actions).

* Update developers.md

* Update developers.md

* Update README.md

* fix(models): remove unused model (#1748)

* fix(models): remove unused model

Fixes #1719

* Drop DeploymentInfo from Dataset's value model & rebuild snapshot

* Update README.md

* Add a separate page for previous townhalls

* Update for August invite; link to history

* Update README.md

* build: remove travis (we're using GitHub actions). (#1770)

Remove travis (we're using GitHub actions).

Also ignore markdown in our current workflows.

Also update the README.md badge.

* update townhall date

* Update README.md

* Update townhalls.md

* build(docker): build & publish GitHub Package (#1771)

* build(docker): build & publish docker images to GitHub Packages

Will kepp publishing to Docker Hub meanwhile until all Dockerfiles have been updated to point to GitHub.
Fixes #1548

* Rebase & fix dockerfile locations

* Update README.md

* Fix README.md

* docs: add placeholders for advanced topics (#1780)

* Create high-cardinality.md

* Create pdl-best-practices

* Create partial-update.md

* Rename pdl-best-practices to pdl-best-practices.md

* Create entity-hierarchy.md

* docs: more placeholders for advance topics (#1781)

* Create aspect-versioning.md

* Create derived-aspects.md

* Create backfilling.md

* Update README.md

* Update aspect-versioning.md

* Update aspect.md

* Update README.md

* Update townhall-history.md

* Update townhall-history.md

* Update rfc.md

* refactor(docker): make docker files easier to use during development. (#1777)

* Make docker files easier to use during development.

During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support.

Changes made to docker files:
- Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides.
- Remove redundant README files that provided little information.
- Rename docker/<dir> to match the service name in the docker-compose file for clarity.
- Move environment variables to .env files. We only provide dev / the default environment for quickstart.
- Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead.
- Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image).
- Added docs/docker documentation for this.

* build: fix docker actions. (#1787)

* bug: Fix docker actions.

We renamed directories in docker/ which broke the actions.

Also try to refactor the action files a little so that we can run (but not publish) these images on pull requests that change the docker/ dir as an extra check. Note this only seems to be supported by the dockerhub plugin; the github plugin doesn't support this (so that will be an issue when we move to it only).

* Drop extra pipes

* Update README.md

* refactor: remove unused model (#1788)

* refactor: remove unused internal models (#1789)

* docs: create search-over-new-field.md (#1790)

Add a doc on searching over a new field

* Update search-onboarding.md

* add description field for dataset index mapping (#1791)

* docs: how to customize the search experience (#1795)

* add description field for dataset index mapping

* documentation on how to customize the search experience

* feat(ingest): add example crawler for MS SQL (#1803)

Also fix the incorrect assumption on column comments & add sample docker-compose file

* Add log documentation

we didn't end up mounting logs; docker desktop is a better experience

* Update townhall-history.md

* Update quickstart.md

* fix(search): clear description from dataset index when it's cleared (#1808)

Fixes #1798

* Update README.md

* Revert "Update README.md"

This reverts commit 74a0d7b.

* Update README.md

* Update README.md

* Update high-cardinality.md

* Update README.md

* Update relationship.md

* Update high-cardinality.md

* Update metadata-models to head! (#1811)

metadata-models 80.0.0 -> 90.0.13:

   90.0.13: Roll forward: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    90.0.2: Refactor listUrnsFromIndex method
    90.0.0: Start distinguishing between [] aspects vs null aspects input param
    89.0.4: Fix the open source build by avoiding URN method that isn't part of the open source URN.
    89.0.2: fix some test case name
    89.0.0: META-12686: Made the MXE_v5 topics become strictly ACL'ed to avoid the wildcard write ACL as "MetadataXEvent.+"
    88.0.6: change DAO to take Storage Config as input
    88.0.3: Add a comment on lack of avro generation for MXEv5 + add MXEv5 to the pegasus validation task.
   87.0.15: META-12651: Integrate the metadata-models-ext with metadata-models
   87.0.13: add StorageConfig to Local DAO
    87.0.3: Treat empty aspect vs optional aspect same until all clients are migrated
    87.0.2: Treat empty aspect vs optional aspect differently
    87.0.1: META-12533: Skip processing unregistered aspect specific MAE.
    83.0.6: action method to return list of urns from strong consistent index
    83.0.4: Change input param type for batch backfill
    83.0.3: Implement batch backfill
    83.0.1: Implement support for OR filter in browse query
   82.0.10: Throw UnsupportedOperationException for unsupported condition types in search filter
    82.0.6: Implement local secondary backfilling index as part of backfill method
    82.0.5: [strongly consistent index] implement getUrns method
    82.0.4: Add indexing urn fields to the local secondary index
    82.0.0: Render Delta fiels in the MCE_v5.
    81.0.1: Add pegasus to avro conversion for FMCE
    80.0.4: add get all support for BaseSingleAspectEntitySimpleKeyResource
    80.0.2: Add a BaseSearchWriterDAO with an ESBulkWriterDAO implementation.
    80.0.1: META-12254: Produce aspect specific MAE with always emit option
    80.0.0: Convert getNodesInTraversedPath to getSubgraph to return complete view of the subgraph (nodes+edges)

* Update townhalls.md

* Update townhalls.md

* fix: drop the commits badge as it's flakey

* Update README.md

* fix: update defaults of aspectNames params (#1815)

fix: Update defaults of aspectNames params.

The last PR to sync internal code broke the external GMS, as code was now expected aspectNames to be null rather than empty by default. This preventing me logging into DataHub as the corp user request would fail (it assumed I asked for no aspects rather than all aspects).

TESTED: Built locally, launched with docker/dev.sh (so used latest frontend, but whatever). Verified I can now log into DataHub, browse and search for datasets, and view my profile.

* Update README.md

* Update README.md

* feat(kubernetes): Improve the security of the kubernetes/helm charts (#1782)

* 1747 | remove obsolete yaml files

* 1747 | remove configmap and its hardcoded references

* 1747 | add missing input parameter of neo4j.host

* 1747 | remove obsolete secrets and parameterize the rest

* 1747 | auto-generate gms secret

* 1747 | remove fullName overrides

* 1747 | fix parameters in subchart's values.yaml

* 1747 | remove hardcoding from parameters for gms host and port

* 1747 | upgrade chart version

* 1747 | update helm docs

* 1747 | add extraEnv, extraVolume and extraMounts

* 1747 | Alters pull policy of images to 'always' for ldh

Co-authored-by: shakti-garg <[email protected]>

* Update README.md

* feat(data-platforms): adding rest resource for /dataPlatforms and mid-tier support (#1817)

* feat(data-platforms): Adding rest resource for /dataPlatforms and mid-tier support

* Removed data platforms which are Linkedin internal

* docs: add NOTICE (#1810)

* Copy NOTICE from wherehows

Copies the file from the wherehows branch.

* Update notice.

* Update links.md

* Update links.md

* Update README.md

* feat(dashboards): RFC for dashboards (#1778)

* feature(dashboards): RFC for dashboards

* Change directory structure

* Create goals & non-goals sections

* Removing alternatives section

* Update README.md

* Update links.md

* Update townhalls.md

* Update notice to include embedded licenses

Also list apache projects specifically.

* feat(frontend): update datahub-web client UI code (#1806)

* Releases updated version of datahub-web client UI code

* Fix typo in yarn lock

* Change yarn lock to match yarn registry directories

* Previous commit missed some paths

* Even more changes to yarnlock missing in previous commit

* Include codegen file for typings

* Add files to get parity for datahub-web and current OS datahub-midtier

* Add in typo fix from previous commit - change to proper license

* Implement proper OS fix for person entity picture url

* Workarounds for open source DH issues

* Fixes institutional memory api and removes unopensourced tabs for datasets

* Fixes search dataset deprecation and user search issue as a result of changes

* Remove internal only options in the avatar menu

* Update search-over-new-field.md

* docs: add external link (#1828)

* Update README.md

* Update links.md

* Review comments based on RFC

Co-authored-by: cobolbaby <[email protected]>
Co-authored-by: Cobolbaby <[email protected]>
Co-authored-by: Harsh Shah <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mars Lan <[email protected]>
Co-authored-by: John Plaisted <[email protected]>
Co-authored-by: Kerem Sahin <[email protected]>
Co-authored-by: Javier Sotelo <[email protected]>
Co-authored-by: jsotelo <[email protected]>
Co-authored-by: Jyoti Wadhwani <[email protected]>
Co-authored-by: Chris Lee <[email protected]>
Co-authored-by: Liangjun Jiang <[email protected]>
Co-authored-by: shakti-garg-saxo <[email protected]>
Co-authored-by: na zhang <[email protected]>
Co-authored-by: shakti-garg <[email protected]>
Co-authored-by: Charlie Tran <[email protected]>
@jplaisted jplaisted deleted the dockerRefactor branch February 26, 2021 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants