-
Notifications
You must be signed in to change notification settings - Fork 58
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add doc about search document & some cleanup
- Loading branch information
Kerem Sahin
committed
Dec 19, 2019
1 parent
8f120f1
commit 3ba1492
Showing
19 changed files
with
126 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
# DataHub Architecture | ||
![datahub-architecture](../imgs/datahub-architecture.png) | ||
|
||
## Generalized Metadata Architecture (GMA) | ||
Refer to [GMA](../what/gma.md). | ||
|
||
## Metadata Serving | ||
Refer to [metadata-serving](metadata-serving.md). | ||
|
||
## Metadata Ingestion | ||
Refer to [metadata-ingestion](metadata-ingestion.md). | ||
|
||
## What is Generalized Metadata Architecture (GMA)? | ||
Refer to [GMA](../what/gma.md). | ||
Refer to [metadata-ingestion](metadata-ingestion.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Metadata Ingestion Architecture | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Metadata Serving Architecture | ||
|
||
![metadata-serving](../imgs/metadata-serving.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# How to onboard an entity? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# How to onboard to GMA graph? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# How to onboard to GMA search? | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# What is Generalized Metadata Architecture (GMA)? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# What is Generalized Metadata Store (GMS)? | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# What is GMA graph? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# What is a search document? | ||
|
||
[Search documents](https://en.wikipedia.org/wiki/Search_engine_indexing) are also modeled using [PDSC](https://linkedin.github.io/rest.li/DATA-Data-Schema-and-Templates) explicitly. | ||
In many ways, the model for a Document is very similar to an [Entity](entity.md) and [Relationship](relationship.md) model, | ||
where each attribute/field contains a value that’s derived from various metadata aspects. | ||
However, a search document is also allowed to have array type of attribute that contains only primitives or enum items. | ||
This is because most full-text search engines supports membership testing against an array field, e.g. an array field containing all the terms used in a document. | ||
|
||
One obvious use of the attributes is to perform search filtering, e.g. give me all the `User` whose first name or last name is similar to “Joe” and reports up to `userFoo`. | ||
Since the document is also served as the main interface for the search API, the attributes can also be used to format the search snippet. | ||
As a result, one may be tempted to add as many attributes as needed. This is acceptable as the underlying search engine is designed to index a large number of fields. | ||
|
||
Below shows an example schema for the `User` search document. Note that: | ||
1. Each search document is required to have a type-specific `urn` field, generally maps to an entity in the [graph](graph.md). | ||
2. Similar to `Entity`, each document has an optional `removed` field for "soft deletion". | ||
This is captured in [BaseDocument](../../metadata-models/src/main/pegasus/com/linkedin/metadata/search/BaseDocument.pdsc), which is expected to be included by all documents. | ||
3. Similar to `Entity`, all remaining fields are made `optional` to support partial updates. | ||
4. `management` shows an example of a string array field. | ||
5. `ownedDataset` shows an example on how a field can be derived from metadata [aspects](aspect.md) associated with other types of entity (in this case, `Dataset`). | ||
|
||
```json | ||
{ | ||
"type": "record", | ||
"name": "BaseDocument", | ||
"namespace": "com.linkedin.metadata.search", | ||
"doc": "Common fields that apply to all documents", | ||
"fields": [ | ||
{ | ||
"name": "removed", | ||
"type": "boolean", | ||
"doc": "Whether the entity has been removed or not", | ||
"optional": true, | ||
"default": false | ||
} | ||
] | ||
} | ||
``` | ||
|
||
```json | ||
{ | ||
"type": "record", | ||
"name": "UserDocument", | ||
"namespace": "com.linkedin.metadata.search", | ||
"doc": "Data model for user entity search", | ||
"include": [ | ||
"BaseDocument" | ||
], | ||
"fields": [ | ||
{ | ||
"name": "urn", | ||
"type": "com.linkedin.common.CorpuserUrn", | ||
"doc": "Urn for the user" | ||
}, | ||
{ | ||
"name": "firstName", | ||
"type": "string", | ||
"doc": "First name of the user", | ||
"optional": true | ||
}, | ||
{ | ||
"name": "lastName", | ||
"type": "string", | ||
"doc": "Last name of the user", | ||
"optional": true | ||
}, | ||
{ | ||
"name": "management", | ||
"type": { | ||
"type": "array", | ||
"items": "com.linkedin.common.CorpuserUrn" | ||
}, | ||
"doc": "The chain of management all the way to CEO", | ||
"default": [], | ||
"optional": true | ||
}, | ||
{ | ||
"name": "costCenter", | ||
"type": "int", | ||
"doc": "Code for the cost center", | ||
"optional": true | ||
}, | ||
{ | ||
"name": "ownedDatasets", | ||
"type": { | ||
"type": "array", | ||
"items": "com.linkedin.common.DatasetUrn" | ||
}, | ||
"doc": "The list of dataset the user owns", | ||
"default": [], | ||
"optional": true | ||
} | ||
] | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# What is GMA search index? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# What is URN? | ||
|