Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PremDay #2 - Prem'Fail - Help ! Server is frozen !

Avatar for PremDay PremDay
April 08, 2025

PremDay #2 - Prem'Fail - Help ! Server is frozen !

As part of the Prem'Fail track, Adrien Mahieux presents a story about frozen servers linked to some firmware and/or broken CPU cores.

Avatar for PremDay

PremDay

April 08, 2025
Tweet

More Decks by PremDay

Other Decks in Technology

Transcript

  1. Let's automate the updates Around 1500 physical equipment Multiple hardware:

    desktop, embedded, servers, pcie switch, fpga Multiple profiles: low-latency server, high-performance server, VDI “Team” of 2 Late on fixes, big-bang upgrades Let's apply automatic fix spread across the month
  2. Searching, groking, ftracing… Unable to find anything… restarted the automatic

    upgrades WAIT A BIT AND… Another server issued the same behavior. 😥 Checking puppet logs: BMC Firmware updated. NOTE: should the update have been done oob, or not logged in our update system, we would not have found it as the source and just dropped the vendor Please use LVFS !
  3. 2 weeks to find the issue, 1 month to wait

    for the update BMC issued a scsi bus rescan every 1300s Should be fixed in new release… In fact no… and nothing in changelog Had to wait another month Business was… not happy. (Ask your local vendor for more detail) Continuing automated upgrades !
  4. Ok let’s see again… It’s really random this time !

    - 03/09: Case opened + sosreport - 03/10: log check: ntp, lots of “system clock wrong” - 03/11: same issue with live-cd, different kernel - 03/14: works OK with HPET but not TSC - 03/23: checking /proc/timer_list - 03/24: C test with gettimeofday() / rdtsc() : - 03/29: replaced motherboard, replaced CPU0 (usual time source) - 04/04: tried kernel testing - 04/12: tested CPUs swapping
  5. How to wait for 1 second without timer ? loop

    until rdtsc >= 3100000000 (3.1 GHz + constant_tsc) Clock: 1002385767 ( 1649061758.988249409 - 1649061757.985863642 / rdtsc: 3100000230 ( 937247212 - 2132214278) Clock: 1002385985 ( 1649061759.990639368 - 1649061758.988253383 / rdtsc: 3100001020 ( 4037261174 - 937260154) Clock: 1002385906 ( 1649061760.993029153 - 1649061759.990643247 / rdtsc: 3100000532 ( 2842306946 - 4037273710) Clock: 3507565526 ( 16490617 64.500598743 - 1649061760.993033217 / rdtsc: 3100000262 ( 1647353206 - 2842320240) Clock: 0 ( 16490617 64.500598743 - 1649061764.500598743 / rdtsc: 3100000280 ( 452403582 - 1647370598) Clock: 0 ( 16490617 64.500598743 - 1649061764.500598743 / rdtsc: 3100000080 ( 3552413880 - 452413800) Clock: 501988793 ( 1649061765. 2587536 - 1649061764.500598743 / rdtsc: 3100000274 ( 2357456568 - 3552423590) Clock: 1002386017 ( 1649061766. 4977546 - 1649061765. 2591529 / rdtsc: 3100000908 ( 1162502956 - 2357469344) Clock: 3318789279 ( 16490617 69.323770894 - 1649061766. 4981615 / rdtsc: 3100000836 ( 4262516976 - 1162516140) Clock: 0 ( 16490617 69.323770894 - 1649061769.323770894 / rdtsc: 3100000058 ( 3067564460 - 4262531698) Clock: 0 ( 16490617 69.323770894 - 1649061769.323770894 / rdtsc: 3100000234 ( 1872607374 - 3067574436) Clock: 690764252 ( 1649061770. 14535146 - 1649061769.323770894 / rdtsc: 3100000174 ( 677649870 - 1872616992) Clock: 1002386192 ( 1649061771. 16924930 - 1649061770. 14538738 / rdtsc: 3100001050 ( 3777662616 - 677661566) Clock: 1002385678 ( 1649061772. 19320331 - 1649061771. 16934653 / rdtsc: 3100000128 ( 2582726222 - 3777693390)
  6. taskset on faulty core Clock: 2690822050 ( 200219.962214449 - 200217.271392399)

    / rdtsc: 4026737214 ( 3321541418 - 3589771500) loops: 194 Clock: 2690882578 ( 200219.962378700 - 200217.271496122) / rdtsc: 4026918882 ( 3322049152 - 3590097566) loops: 377 Clock: 1604226740 ( 200217.271641676 - 200219.962382232) / rdtsc: 268487372 ( 3590547800 - 3322060428) loops: 50 Clock: 2690898342 ( 200219.962542900 - 200217.271644558) / rdtsc: 4026967796 ( 3322557264 - 3590556764) loops: 425 Clock: 1604227376 ( 200217.271805861 - 200219.962545781) / rdtsc: 268489322 ( 3591055596 - 3322566274) loops: 52 Clock: 2690898682 ( 200219.962707195 - 200217.271808513) / rdtsc: 4026968848 ( 3323065318 - 3591063766) loops: 426 Clock: 1604227739 ( 200217.271970253 - 200219.962709810) / rdtsc: 268490460 ( 3591563966 - 3323073506) loops: 53 Clock: 2690898372 ( 200219.962871169 - 200217.271972797) / rdtsc: 4026968612 ( 3323573204 - 3591571888) loops: 390 Clock: 1604227651 ( 200217.272134648 - 200219.962874293) / rdtsc: 268488396 ( 3592072102 - 3323583706) loops: 51 Clock: 2690897527 ( 200219.963034868 - 200217.272137341) / rdtsc: 4026966832 ( 3324080302 - 3592080766) loops: 424 Clock: 1604228240 ( 200217.272298641 - 200219.963037697) / rdtsc: 268490460 ( 3592579522 - 3324089062) loops: 53 Clock: 2690897515 ( 200219.963199060 - 200217.272301545) / rdtsc: 4026966808 ( 3324588040 - 3592588528) loops: 424 Clock: 1604228280 ( 200217.272462741 - 200219.963201757) / rdtsc: 268490516 ( 3593087014 - 3324596498) loops: 53 Clock: 2690897507 ( 200219.963363306 - 200217.272465799) / rdtsc: 4026966816 ( 3325096028 - 3593096508) loops: 424 Clock: 1604228548 ( 200217.272627129 - 200219.963365877) / rdtsc: 268491402 ( 3593595462 - 3325104060) loops: 54 Clock: 2690897824 ( 200219.963527719 - 200217.272629895) / rdtsc: 4026967758 ( 3325604448 - 3593603986) loops: 425 [ ... ] Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268437096 ( 1262992808 - 994555712) loops: 1 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 4026533522 ( 994563180 - 1262996954) loops: 1 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268881816 ( 1263459006 - 994577190) loops: 435 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 4026548854 ( 995049170 - 1263467612) loops: 16 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268437140 ( 1263494316 - 995057176) loops: 1 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268440200 ( 1263503060 - 995062860) loops: 4 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268437228 ( 1263512538 - 995075310) loops: 1 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 4026534564 ( 995083840 - 1263516572) loops: 2 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268875506 ( 1263966986 - 995091480) loops: 429 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 4026556054 ( 995563652 - 1263974894) loops: 23 Clock: 0 ( 200219.975307367 - 200219.975307367) / rdtsc: 268437140 ( 1264008870 - 995571730) loops: 1
  7. But why the randomness ? Timekeeping was not done only

    by CPU0 anymore… due to realtime kernel we used (previous win) The main path is from: apic_timer()/hrtimer_interrupt() -> __hrtimer_run_queues() -> tick_sched_timer(): 1367 -> tick_sched_do_timer(): 141 -> tick_do_update_jiffies64(): 97 -> update_wall_time(): 2140 -> timekeeping_advance(TK_ADV_TICK): 2124 -> memcpy(). tick_sched_timer(): Calls ktime_get() to get current time. If cpu == tick_do_timer_cpu call tick_do_update_jiffies64() with new time. NOTE: tick_do_timer_cpu is not always the same. tick_do_update_jiffies64(): Will update jiffies and then call update_wall_time(void) which calls timekeeping_advance(). timekeeping_advance(): Updates the shadow_timekeeper. Write_seq_begin’s tk_core. Memcpy’s shadow_timekeeper to tk_core. Write_seq_end’s tk_core. NOTE: any read of tk_core values while write_seq_begin will loop till write_seq_end