BULLETIN NUMBER: RH-SWB-002
DATE ISSUED: October 31, 2007
DATE CLOSED: N/A (this problem will never be solved)
AFFECTED SYSTEMS: iHawk systems using Nvidia chipsets on motherboard
NOTE: this does not apply to Nvidia graphics controllers
RELEASE LEVEL: RedHawk Linux 4.1 and all 4.1 updates
EXPLANATION: IRQ0 shielding broken on some Nvidia based iHawks.
Timer overrides on some Nvidia chipsets do not work. The breakage can
depend on the BIOS version.
The result is that IRQ0 is programmed as an XT-PIC IRQ.
XT-PIC is legacy 8259 (PIC) mode and does not support multiprocessor IRQ
handling.
It has been observed that this prevents shielding IRQ0 on CPU0 (presumably,
the boot cpu). There may be other problems associated with this such as
constant generation of parallel port interrupts that also can not be
shielded.
The condition is fairly easy to spot in /proc/interrupts:
---------------------------------------------------------
CPU0 CPU1 CPU2 CPU3
0: 92639 937 190 69 XT-PIC timer
This will be accompanied by the following dmesg output (pre-4.1.11):
--------------------------------------------------------------------
Nvidia board detected. Ignoring ACPI timer override.
<snip>
..MP-BIOS bug: 8254 timer not connected to IO-APIC
failed
The 2.6.15.4 linux kernel assumes _ALL_ nvidia chipset timer overrides
are bogus and therefore always forces the "acpi_skip_timer_override"
policy, whenever the boot code discovers an nvidia chipset.
This may, or may not be the correct action, and has been observed to
change depending on the BIOS rev.
The acpi_skip_timer_override policy has been relaxed in later kernels,
depending on the presence of an HPET timer and/or the _exact_ chipset
id.
More recent motherboards with Nvidia chipsets likely support ACPI
timer overrides.
RESOLUTION: There are a few workarounds
1. Update to RedHawk 4.2 or later
RedHawk 4.2 and later kernels are "tickless" and do not use global timer
interrupts (IRQ0), so this problem should not affect RedHawk 4.2, even though
there may be similar boot messages and IRQ0 progammed as an XT-PIC interrupt.
2. Update to RedHawk 4.1.11 or later
The RedHawk 4.1.11 default behavior is to allow ACPI timer overrides. This
policy should be the most appropriate for more recent motherboads.
The RedHawk 4.1.11 default behavior can be overriden by booting with the kernel
boot parameter:
acpi_skip_timer_override
The correct policy to allow or disallow ACPI timer overrides must be
determined for each individual system.
*** NOTE: The default 4.1.11 nvidia chipset policy may break some systems!
The breakage can be reversed by using the boot parameter.
The 4.1.11 boot messages now contain the following messages:
Default:
--------
Nvidia board detected. Allowing ACPI timer override.
If you have timer trouble try acpi_skip_timer_override.
Using the acpi_skip_timer_override boot parameter:
--------------------------------------------------
Nvidia board detected. Ignoring ACPI timer override.
WARNING: acpi_skip_timer_override may break shielding of IRQ0.
The correct choice can be verified by looking at /proc/interrupts:
------------------------------------------------------------------
CPU0 CPU1 CPU2 CPU3
0: 137 3 24 37812 IO-APIC-edge timer
Notice the above timer interrupt is correctly programmed as "IO-APIC-edge".
3. Upgrading and/or downgrading the BIOS rev may or may not change the
result.
CAUTION: ALLWAYS HAVE A BACK UP COPY OF YOUR CUURENT BIOS BEFORE ATTEMPING
TO CHANGE THE VERSION.