Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tast run failures because of Tast internals #838

Open
musamaanjum opened this issue Oct 15, 2024 · 7 comments
Open

Tast run failures because of Tast internals #838

musamaanjum opened this issue Oct 15, 2024 · 7 comments

Comments

@musamaanjum
Copy link
Contributor

musamaanjum commented Oct 15, 2024

I've seen so many of these kind of errors. Strangely tast outputs errors and no test is executed. This must be looked at and fixed. This is the number 1 reason most tast jobs fail and get marked as infrastructure failures.

2024-10-30T23:29:52.809425Z Fail /var/log/recover_duts/recover_duts.log does not exist; will not retry streaming
2024-10-30T23:29:52.874228Z Form factor not found: 
2024-10-30T23:29:52.968843Z Unable to parse ro_fwid versions: strconv.Atoi: parsing "2024_05_21_1511": invalid syntax
2024-10-30T23:29:53.262392Z Modem not found
2024-10-30T23:29:54.361968Z Failed to get lid microphone: exit status 146
2024-10-30T23:29:55.383786Z Failed to get base microphone: exit status 146
2024-10-30T23:29:55.383906Z Unknown form factor: FORM_FACTOR_UNKNOWN
...
2024-10-30T23:30:05.680286Z [23:30:05.680] Running rpc server: [env /usr/local/libexec/tast/bundles/local/cros -rpc]
...
2024-10-30T23:32:06.860910Z [23:32:06.001] Disconnecting from DUT
2024-10-30T23:32:06.870598Z [23:32:06.870] Got global error: connection to test bundle /usr/local/libexec/tast/bundles/local/cros broken: rpc error: code = Unknown desc = RunTests: failed in run tests recursively: failed to set up test environment: pre-run failed: failed waiting for system-services job after 120.000000 seconds: context deadline exceeded during a poll with timeout 2m0s; last error follows: status stop/waiting (status 1)
2024-10-30T23:32:06.870630Z [23:32:06.870] Disconnecting from DUT
2024-10-30T23:32:06.870639Z [23:32:06.870] Closing DUT SSH connection to 192.168.201.14:22
2024-10-30T23:32:06.872335Z Got global error: connection to test bundle /usr/libexec/tast/bundles/remote/cros broken: rpc error: code = Unknown desc = RunTests: failed in run tests recursively: run failed: no test ran in the last attempt (status 1)
2024-10-30T23:32:06.872363Z Failed to run tests: no test ran in the last attempt

https://lava.collabora.dev/scheduler/job/16060973
https://lava.collabora.dev/scheduler/job/16281873#L9431

It is a known long-standing issue.

b#334788335
cc: @laura-nao @a-wai @padovan @nuclearcat

@musamaanjum musamaanjum self-assigned this Oct 15, 2024
@musamaanjum musamaanjum converted this from a draft issue Oct 15, 2024
@musamaanjum
Copy link
Contributor Author

I'll try to find time and work on fixing this failure.

@musamaanjum musamaanjum changed the title Tast run failed Tast run faileures because of Tast internals Oct 16, 2024
@musamaanjum musamaanjum changed the title Tast run faileures because of Tast internals Tast run failures because of Tast internals Oct 26, 2024
@musamaanjum
Copy link
Contributor Author

@a-wai @padovan @nuclearcat @laura-nao I've confirmed that not even a single tast test (from kernel, hardaware, platform, power and mm-misc) is able to run because we get the error from Tast tool, "Form Factor not known".

It is a known problem which we used to hit only on some jobs. We were able to get results on other runs. But from several days, there are no results.

I was thinking maybe we are not getting successful run results in KCIDB as lava is unable to submit them and hence Grafana/Result-Summary are unable to show me results. Now that Lava issue has been resolved, I've again found no result today in Grafana/Result-summary. So I've looked at logs of each tast-kernel job on Lava for 31st October. Each of those jobs have either "Form Factor not found" or some other less priority issue.

I'm able to find video/codec tast tests getting executed without an issue. Not sure why.

Grafana tast results 31st October

@a-wai
Copy link
Contributor

a-wai commented Oct 31, 2024

FWIW, while testing ARM64 devices on R130 I noticed 6.12-rc and -next kernels seemed to always fail, but 6.11 (or staging-stable, which is tracking 6.1) seemed to be passing.

Maybe a change in a Kconfig option upstream which breaks our setup in a different and creative way?

Disclaimer: my sample size was quite small there, so it might just be coincidence, further analysis is needed.

@musamaanjum
Copy link
Contributor Author

For stable-rc, 11-20 October we were getting usual good results. From 21-31, we have only got some results on Qualcomm targets. All other targets have "Form factor error". I've not looked specifically at kernel versions.

This could be the problem with kernels or configs. But stable kernels seem affected as well. They shouldn't have gotten affected if there was new change to the recent mainline kernels. But we can never be sure until we find root cause and fix it.

@musamaanjum
Copy link
Contributor Author

I think we should report these errors and ask the original tast developers about this situation.

Over the next few days, I can collect more results from R130 chromiumos. Maybe we'll better or even worse situation.

@musamaanjum
Copy link
Contributor Author

The x86 chromebooks are still failing with same errors.

ARM64 devices have been updated to R130. They are failing because of some failures mentioned in: #868

@musamaanjum musamaanjum removed their assignment Nov 5, 2024
@musamaanjum
Copy link
Contributor Author

cc: @laura-nao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants