Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(send_kcidb.py): Set processed_by_kcidb_bridge flag after processing node #919

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

nuclearcat
Copy link
Member

Requires: kernelci/kernelci-core#2746

After node is being processed, set flag to indicate we sent data to kcidb.

@nuclearcat nuclearcat added the staging-skip Don't test automatically on staging.kernelci.org label Nov 29, 2024
# TBD: job nodes might have child nodes, mark them as processed as well
node['sent_kcidb'] = True
try:
self._api.node.update(node)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the safe place to update the node is in _send_revision method after data validation is done with kcidb.io.SCHEMA.is_valid(revision). As for some reason, if data is invalid, the service would still set the kcidb_sent flag to True.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might move there, but may data become valid later?
Because if i will not flag node as processed, it will keep trying to resend it (and fail, because data is invalid).
Maybe better we just report invalid data somewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data becomes invalid if incorrect type of data is stored in node or there is some parsing issue while generating KCIDB-compatible data dictionary in the bridge service.
That's why the same data won't be valid again without any changes to service or node model.
Maybe we can have an error log for invalid data. Also, we can have a dict field in Node model for KCIDB related processing information with 2 flags i.e. kcidb_sent and processed_by_kcidb_bridge, and use both the fields for a submission retry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe i will change name to processed_by_kcidb_bridge (but it feels a bit long), it will be more correct such way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# mark the node as processed, sent_kcidb field to True
# TBD: job nodes might have child nodes, mark them as processed as well
node['sent_kcidb'] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the field name is kcidb_sent from PR kernelci/kernelci-core#2746.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, sorry for mistake, i am still thinking how it should look and drafting where i will put rest of code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, np. I'd be happy to help in case you find something unclear in the bridge service.

@nuclearcat nuclearcat force-pushed the kcidb-flag branch 5 times, most recently from 4369323 to 0d479c1 Compare November 29, 2024 06:21
Requires: kernelci/kernelci-core#2746
After node is being processed, set flag to indicate we sent data to kcidb.

Signed-off-by: Denys Fedoryshchenko <[email protected]>
@nuclearcat nuclearcat changed the title feat(send_kcidb.py): Set sent_kcidb flag after processing node feat(send_kcidb.py): Set processed_by_kcidb_bridge flag after processing node Nov 29, 2024
node = self._find_unprocessed_node()

if not node:
node, is_hierarchy = self._api_helper.receive_event_node(context['sub_id'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will miss PubSub events if this is conditional as this will only process real-time event data if no unprocessed nodes are found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i recall we tested before together, active pubsub subscription will queue events and wont drop them.
If it doesnt(if i missed something), then it need to be redesigned, because even in current code, it means it might miss event, if other event submission going too slow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. This needs to be verified.

Data in node might be invalid, and not really sent to kcidb
This is workaround, until we improve event handling in kernelci-pipeline/api
"""
node['processed_by_kcidb_bridge'] = True
Copy link
Collaborator

@JenySadadia JenySadadia Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were going to have 2 flags, i.e. one is for successful submission and another one is for processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for submission not yet, as it will be next step of handling errors, which is unclear if happening at all

Comment on lines +508 to +511
child_nodes = self._api.node.find({'parent': node['id']})
if child_nodes:
for child in child_nodes:
self._node_processed_recursively(child)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of querying API, we should be getting node IDs from revision data as there might be some cases when not all the child nodes are submitted to KCIDB.
So, mark sent_kcidb to True only if the node was actually submitted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we mark just "processed by kcidb bridge", to workaround missing events issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So, sent_to_kcidb would be a different flag for successful submission. Got it.

@nuclearcat nuclearcat removed the staging-skip Don't test automatically on staging.kernelci.org label Dec 2, 2024
@nuclearcat nuclearcat added this pull request to the merge queue Dec 2, 2024
Merged via the queue into kernelci:main with commit fea83b9 Dec 2, 2024
3 checks passed
@nuclearcat nuclearcat deleted the kcidb-flag branch December 2, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants