Open
Description
Is There an Existing Issue for This?
- I have searched the existing issues
Where do you intend to apply this feature?
Instill Core, Instill Cloud
Is your Proposal Related to a Problem?
Background
When there are multiple data sources in companies, the data engineers in the companies need to migrate data from a source to another source.
The data is scattered around in applications, it is time-consuming for a company to write several tools to collect the data from applications, such as Gmail / Slack / ….
Describe Your Proposed Solution
User stories
Story 1
- As a data engineer, he/she wants to migrate raw data to analysable data to another data source
e.g. transaction data is not analysable, but weekly transaction amount & transaction count are.
Story 2
As a data engineer, he/ she wants to transform unstructured data into analysable data and load to another data source.
Highlight the Benefits
It can solve the problem in the real world.
Anything Else?
Possible components
- Note: The sequence means the priority.
Data components
RDBMS
- AWS
- RDS
- GCP
- Cloud SQL / BigQuery
- Postgres
- MySQL
- MSSQL
- Oracle DB
- …
NoSQL
- AWS
- NoSQL (DynamoDB / MongoDB)
- GCP
- Datastore
- MongoDB
- Elasticsearch
- Cassandra
- …
Vector DB
- Weaviate
- Qdrant
- Chroma
- Zilliz
- Milvus
Others
- AWS
- S3
- GCP
- Google Cloud Storage
- AWS Datalake
- Google Sheet
- …
Application components
- Discord / X / Slack / … are expected to built from other tools. But, you could need to build a specific TASK for Application component according to your usage.
- Please notify in Slack if there are further concrete idea that you want to build some specific application components. We can discuss those in details.
Reference tools
- Airbyte
- Data source -> Data destination
- …
Milestones
- Read the current pipelines
- https://instill.tech/abrc/pipelines/stomavision
- https://instill.tech/pinglin/pipelines/contract-reviewer
- https://instill.tech/gstrong/pipelines/quickstart
- Design the pipeline according to user stories.
- Please draw the concrete pipelines first to ask us review before delving into development.
- Timeline: 5 working days
- Check which components we are missing according to the designed pipeline.
- Please create the skeleton PR first for the incoming components
- Timeline: 2~3 working days
- Connect those components.
- Timeline: 10 working days
- Build the designed pipeline after you connect those components.
- Timeline: 1 working day
Note
- About timeline, let's adjust it dynamically if there are much more complicated issues than we think
- Milestone 2~5 is a cycle. Let's finish a whole complete user story first and then iterate it.
Activity