Skip to content

Commit

Permalink
doc: start adding the roadmap to 1.0. (linkedin#37)
Browse files Browse the repository at this point in the history
I tried adding some points as "required" and others as "potential" (up for discussion). Because of the latter category, we need some extensive discussion around what we want in 1.0 since it isn't solid yet.
  • Loading branch information
John Plaisted authored and jywadhwani committed Nov 10, 2020
1 parent 7cc3c4f commit d87bb96
Showing 1 changed file with 83 additions and 2 deletions.
85 changes: 83 additions & 2 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,85 @@
# GMA Roadmap

TODO! We just move this to its own repo, so our first order of business is cleaning up this repo! Then we can lay out
our longer term roadmap.
Status: **Draft**

GMA is currently on major version 0. This roadmap currently defines the requirements for a 1.0 release. It is currently
a draft, as we need some major discussions around what we want 1.0 to look like.

There are a few options of what we can do with a `1.0` release:

1. `1.0` is a ground up rewrite of our code. Developed in a fork and completely unrelated to our existing code.
2. `1.0` is still a rewrite, but is developed in the same branch. The last release of `0.x` contains the old and new
code, making a transition to `1.0` easier. The primary issue here is naming; we can't reuse any names in `1.0` that
exist today.
3. `1.0` is just a small cleanup of our existing APIs where there isn't too much onerous on a migration. Maybe we also
add some more core features.

So really we have two questions to solve, regarding existing code and new features:

1. What APIs need fixing, if any, and do they warrant complete rewrites? If so, how do we handle those rewrites?
2. What additional features, if any, would we want to consider for a `1.0` release.

---

## Required `1.0` Features

List of features we know we want in a `1.0` release.

Status: **Draft**

- [ ] Metadata Events v5 support.
- [ ] Auto generate event definitions from PDL annotations at build time with a gradle plugin.
- [ ] Support for DAOs to emit MAE v5.
- [ ] Jobs (or job libraries) to consume MCE v5 and update GMSes.
- [ ] Jobs (or job libraries) to consume MAE v5 and update Elasticsearch.
- [ ] Jobs (or job libraries) to consume MAE v5 and update Graph.
- [ ] Kafka topic auto generation gradle plugin or script.
- [ ] Migration playbook for users to get off v4.
- [ ] Enable `werror` for all Java code.
- [ ] Remove use of tuples library (not idiomatic Java - replace with helper classes / POJOs).
- [ ] Elasticsearch 7 support.
- [ ] Java 11 support.

---

## Potential `1.0` Rewrites / Improvements / Features

Features / improvements we need to discuss further to see if they make sense on have a `1.0` release on.

- Kafka job or libraries are in GMA.
- The kafka jobs did not move from the DataHub repo. They're still tightly coupled to MCE and MAE v4, which are in
turn coupled to the models that live in DataHub.
- We should add the jobs (or very easy to use libraries that can be invoked via a job) for MCE and MAE v5 for `1.0`.
- Jobs include MCE consumer (consumes MCEs and updates GMSes) and MAE consumers for ElasticSearch and Graph.
- Index builder API should be type safe.
- Index builders also did not move from the DataHub repo. Are needed for jobs.
- The API is that you give the super constructor the list-of-snapshots you want to transform. But then the
transformation method gives you a generic Snapshot. Nothing enforces you have to handle those things in the list you
said you wanted to handle. We saw this lead to some confusion and a user did get it wrong once.
- Fixed by making index builders listen to one thing only, and then having the transformation method argument be that
type.
- Remove extensive use of abstract "Base" classes. Prefer interfaces.
- Abstract classes are still okay, just not as the root of the inheritance tree.
- Split classes / interfaces where it makes sense, especially to avoid `UnsupportedOperationException`.
- Improve / publish code coverage.
- Ensure all public methods and classes are documented with Java doc.
- Add check to prevent regression.
- Remove json "search templates" and instead use something in code.
- JSON is brittle and hard to get right. Java types make this easier. Elasticsearch already provides APIs for these.
- Regex replacement is not correct unless we're very careful with escaping the string.
- Search type safety.
- The input today is a string that has meaning to Elasticsearch. Can be unclear to the client as to how to build this.
- The search model is not the same as the GMA model, which can lead to confusion.
- Rewrite DAOs for extensibility.
- SCSI muddled the interface quite a bit, as not all DAOs support SCSI. Part of the issue is the DAO design was not
extensible to allow SCSI without extending the interfaces.
- Codegen for restli GMSes.
- Rest.li relies on Java annotations to build IDL files. Does not look at super class annotations. So many methods
must be copied / pasted / overloaded just to get annotations. Good opportunity for some code gen.
- Codegen for URN.
- The URN classes in DataHub are pretty much copy and paste. We could easily generate these classes.
- Deletion of references to snapshots and aspect unions.
- MXE v5 helps to enabled this. Allows more modular GMSes. Also allows models to no longer be in a single repo.
- Rename test models for a more grounded example.
- "Foo" and "Bar" have no meaning and can be kind of hard to read some times, and are just poor examples. Naga has had
some fun "foodie" examples in past presentations, maybe we just use that kind of thing for our test models?

0 comments on commit d87bb96

Please sign in to comment.