Skip to content

Possible bug in RISC-V64 leading to no interrupts after MIE and MPIE bit have been cleared before an mret instruction #517

@cpdpls

Description

@cpdpls

Describe the bug
In the RISCV-64 port, when returning from the trap handler, it seems like the thread being resumed has it's interrupts disabled. Both MPIE and MIE bits are cleared in the mstatus register. Leading to the resumed thread having no interrupts potentially leading to an infinite loop when polling a variable that changes inside an IRQ.

We are running ThreadX on the Polarfire SOC from Microchip. Using the latest RISCV-64 port, after an initial Hardware timer interrupt is issued and a call to both _tx_thread_context_save _tx_thread_context_restore followed by a final mret instruction, we observe the mstatus register having both the MPIE and MIE bits set to 0, completely disabling any interrupt. Is this done by design ?

It happened in the following situation where at the startup we only have 1 thread running :

Init thread

  1. HW timer is setup
  2. An initial I2C request is being prepared
  3. HW Timer interrupt (call to tx context saving mechanisms)
  4. Init thread resumes
  5. I2C request is issued
  6. HW Timer interrupt
  7. Init thread resumes
  8. Dead loop as we are waiting for an interrupt from the I2C driver.

Here is the relevant code where both MPIE and MIE are being cleared

/* Compose mstatus via read/modify/write to avoid clobbering unrelated bits.
Set MPIE and restore MPP to Machine, preserve other fields. */
csrr t1, mstatus
/* Clear MPP/MPIE/MIE bits in t1 then set desired values. */
li t2, 0x1888 // MPP(0x1800) | MPIE(0x80) | MIE(0x08)
li t3, 0x1800 // Set MPP to Machine mode (bits 12:11)
/* Construct new mstatus in t1: clear mask bits, set MPP/MPIE and optionally FP bit,
preserve everything except the bits we will modify. */
li t4, ~0x1888 // Clear mask for MPP/MPIE/MIE
and t1, t1, t4
or t1, t1, t3
#if defined(__riscv_float_abi_single) || defined(__riscv_float_abi_double)
li t0, 0x2000 // Set FS bits (bits 14:13 to 01) for FP state
or t1, t1, t0
#endif
csrw mstatus, t1 // Update mstatus safely

The bitwise not sets the bits at position 3(MIE) and position 4(MPIE) to 0. Right after there is an and instruction with the original mstatus value stored in t1. Then finally, an or operation which just sets bits at position 11:12(MPP). A final mret instruction is issued. Since MPIE has been set to 0, mret will copy MPIE value back to MIE leading to interrupts being disabled.

There is additionally the register t2 not being used but still being set. My guess is that it's part of some dead code now ?

To Reproduce
Unfortunately, I can't share any screenshots or particular steps to reproduce the issue as I'm writing this from another laptop. I will inevitably share those once I get back to my work place if this issue catches your attention.

Expected behavior
After the call to mret instruction, the expectation would be to have the MIE bit not being cleared when returning to a thread which had it's interrupt enabled before it being context switched. The fact that the MPIE bit is cleared before returning from machine mode, makes is so that there is no other way to have the interrupts being enabled unless explicitly enabled back once inside a thread.

Impact
Infinite loop on polling a status variable from a driver which should change when an IRQ for this driver is called.

Logs and console output
Again, I will provide those as soon as possible

Solution

Maybe or ~t2 with t1 to at least save those MPIE and MIE bits.

csrr t1, mstatus
li t2, ~0x1888
or t1, t1, t2

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghardwareNew hardware or architecture support request

Type

No type

Projects

Status

Discussion

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions