-
Problem report
-
Resolution: Fixed
-
Critical
-
3.0.27, 4.0.7, 4.2.1
-
ZBX server v4.2.0 on (CentOS Linux release 7.6.1810 (Core))
ZBX agent v4.2.1 (previously v3.4.11 or v3.4.14) on (CentOS Linux release 7.5.1804 (Core))
-
Sprint 52 (May 2019)
-
0.125
We had this issue quite long time already (on versions 3.4.11 or 4.2.1), reporting it just now as it became too annoying.
We have 3 servers, having 80 CPU cores and our zabbix_server.log is filled by messages like these:
10639:20190506:111426.753 item "system_ovn3.domain:system.cpu.util[58,user]" became not supported: Value 307445734561825856.000000 is too small or too large. 10648:20190506:111439.777 item "system_ovn2.domain:system.cpu.util[51,user]" became not supported: Value 308010420332435328.000000 is too small or too large. 10639:20190506:111526.913 item "system_ovn3.domain:system.cpu.util[58,user]" became supported 10643:20190506:111539.947 item "system_ovn2.domain:system.cpu.util[51,user]" became supported 10643:20190506:111606.996 item "system_ovn1.domain:system.cpu.util[58,user]" became not supported: Value 307496984059169024.000000 is too small or too large. 10648:20190506:111706.938 item "system_ovn1.domain:system.cpu.util[58,user]" became supported
Each such host has ~600 items, where 328 items are different "system.cpu.util*" keys (80*4 - idle,iowait,system,user + a 8 items for whole CPU).
Update interval is 1m for all the "system.cpu.util" items.
Also each host has 3 "system.cpu.load[percpu,avg5]" items (for avg1 avg5 avg15).
We have run 'system.cpu.util[58,user]' key using zabbix_get tool in a loop with 5 seconds delay, and during 5 minutes received the huge value once for one server:
# while true; do echo "`date` - `zabbix_get -s system_ovn1.domain -k 'system.cpu.util[58,user]'`"; sleep 5; done Mon May 6 12:58:32 CEST 2019 - 0.083333 Mon May 6 12:58:37 CEST 2019 - 0.083333 Mon May 6 12:58:42 CEST 2019 - 0.083333 Mon May 6 12:58:47 CEST 2019 - 0.083333 Mon May 6 12:58:52 CEST 2019 - 0.083319 Mon May 6 12:58:57 CEST 2019 - 0.016661 Mon May 6 12:59:02 CEST 2019 - 0.016664 Mon May 6 12:59:07 CEST 2019 - 0.033333 Mon May 6 12:59:12 CEST 2019 - 0.016667 Mon May 6 12:59:17 CEST 2019 - 0.016664 Mon May 6 12:59:22 CEST 2019 - 0.000000 Mon May 6 12:59:27 CEST 2019 - 0.000000 Mon May 6 12:59:32 CEST 2019 - 0.000000 Mon May 6 12:59:37 CEST 2019 - 0.000000 Mon May 6 12:59:42 CEST 2019 - 0.000000 Mon May 6 12:59:47 CEST 2019 - 0.000000 Mon May 6 12:59:52 CEST 2019 - 0.000000 Mon May 6 12:59:57 CEST 2019 - 0.000000 Mon May 6 13:00:02 CEST 2019 - 0.000000 Mon May 6 13:00:07 CEST 2019 - 307343286799559360.000000 Mon May 6 13:00:12 CEST 2019 - 0.000000 Mon May 6 13:00:17 CEST 2019 - 0.000000 Mon May 6 13:00:22 CEST 2019 - 0.000000 Mon May 6 13:00:27 CEST 2019 - 0.000000 Mon May 6 13:00:32 CEST 2019 - 0.000000 Mon May 6 13:00:37 CEST 2019 - 0.066678 Mon May 6 13:00:42 CEST 2019 - 0.083319 Mon May 6 13:00:47 CEST 2019 - 0.083333 Mon May 6 13:00:52 CEST 2019 - 0.083333
Yes, we have many items, which you can say are not so useful/required, but zabbix agent should not return a wrong value.
We tried to run "system.cpu.util" key in similar loop for longer period, on all 3 servers, and did not caught the issue.
Maybe we were just not luck enough, who knows ...