Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka ingestion source ingests schemas for all topics and ignores topic_patterns #11907

Closed
guentherhackl-wgs opened this issue Nov 20, 2024 · 6 comments
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata

Comments

@guentherhackl-wgs
Copy link

Describe the bug
When i load kafka topics and use topic_patterns it filters all the topics but correctly but still produces events for schemas for all topics on the cluster.

To Reproduce

  1. have a broker with 2 topics, both having a schema in confluent schema registry
  2. make a recipe that excludes one of the topics
  3. see all the schemas ingested anyway

Expected behavior
the schemas schould also be filtered

Desktop (please complete the following information):

  • OS: Any
  • Browser Any
  • Datahub CLI Version 0.13.3
@guentherhackl-wgs guentherhackl-wgs added the bug Bug report label Nov 20, 2024
@RyanHolstien RyanHolstien added the ingestion PR or Issue related to the ingestion of metadata label Nov 20, 2024
@guentherhackl-wgs
Copy link
Author

@hsheth2, if you want i could prepare a PR. From what i see this could be solved by introducing a new config schema_pattern similar to the topic_pattern. Just reusing topic_pattern might be problematic for some topic to schema mapping. Ingesting all schemas separately to the topics where they are already available also doesn't provide us any benefit i think and this would allow to virtually disable the separate ingestion. What do you think?

@hsheth2
Copy link
Collaborator

hsheth2 commented Dec 9, 2024

@guentherhackl-wgs that sounds like it would be super helpful - schema_pattern seems like a reasonable name for it as well

Also note that we're planning on adding a flag to disable the ingestion of all schemas separately - we'll probably make a change so that feature is not enabled by default

@guentherhackl-wgs
Copy link
Author

@hsheth2 this flag would also do what i'm looking for. Is this already planned for a specific timeframe, as I could also incorporate it? With my suggestion I was mainly thinking of replicating what already exists in topic_pattern and "deny: '*'" which would also disable the feature completely and of course could be set as the default as well, but nothing would speak against a separate flag to disable it

@skrydal
Copy link
Collaborator

skrydal commented Dec 10, 2024

@guentherhackl-wgs we are already working on the flag, it should be merged shortly

@skrydal
Copy link
Collaborator

skrydal commented Dec 11, 2024

@guentherhackl-wgs here is the PR, would you be able to try it before we merge it?

@hsheth2
Copy link
Collaborator

hsheth2 commented Dec 11, 2024

Closing for now since #12077 was merged

@hsheth2 hsheth2 closed this as completed Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

No branches or pull requests

4 participants