My intro to the RISCV LLVM Backend: modifying an existing LLVM pass to merge GPRPair moves

Welcome

February 27, 2026·

For a while now I’ve been interested in the RISCV architecture, and recently I had the opportunity to satiate this interest by working on a pass improving code gen on the RISCV backend for LLVM.

Relevant PRs:

Some context

There exists a pass known as the RISCVMoveMerger. Prior to my work this pass was solely intended to merge particular instructions on RISCV targets with the zcmp or xqccmp extension. An extension can be thought of as an addition to the existing RISCV ISA.

In assembly, we have instructions that move values between registers, referred to as move or copy instructions. In a 64-bit CPU, the registers would be 64 bits wide, and correspondingly, the registers would be 32-bits in a 32-bit CPU. A good chunk of variables in C/C++ are interpreted as 32-bits unless specified (int, float, etc.). So what happens when a code specifies a 64-bit integer on a 32-bit CPU? Well, something akin to the following. mv indicates this is a move instruction, the first operand is the destination, and the second operand is the source.

// Copy the lower 32 bits
mv a0, t0    
// Copy the upper 32 bits
mv a1, t1

In the context of the RISCVMoveMerger, the zcmp extension has two instructions of concern. They are CM.MVSA01 and CM.MVA01S. They use the following syntax.

mv a0, r1
mv a1, r2
// Statement below is equivalent.
// We are moving values in register r1 and r2 into a0 and a1 respectively.
cm.mvsa01 r1, r2

mv r1, a0 
mv r2, a1 
// Statement below is equivalent.
// We are moving values in register a0 and a1 into r1 and r2 respectively.
cm.mva01s r1, r2

a0 is designated as the first argument register. Whenever a function is called, it’s input parameters will be placed in a0. For example, when calling foo(int x)i, x would be stored in a0. These compressed instructions take advantage of this by making the source and destination implicit. The cm* don’t implicitly define the source or destination registers, and instead implicitly use a0/a1.

To get back to RISCVMoveMerger, the goal of this pass was to find instructions like these two 32-bit moves:

mv a0, r1         mv r1, a0
mv a1, r2         mb r2, a1

and merge them to make use of zcmp’s cm.mv* instructions. It can be thought of as providing a “macro” to handle 64-bit logic in a single step.

My PR(s)

Hopefully that laid the ground work for what the RISCVMoveMerger was doing prior to my work. I added onto this pass by extending to other types of possible 32-bit to 64-bit pairs for other targets. This was a task raised by @lenary, who allowed me to pursue an implementation.

RISCV supports quite a lot of different extension, and two extensions that concerns this pass is the experimental-p and zdinx target. They respectively support the instructions addd, and fsgnj.d which both can support GPRPairs as operands. A GPRPair is a pair of registers which can be treated as one unit. We can take a look at RISCVRegisterInfo.td - a tablegen file that describes the register info the RISCV architecture - to find out the exact details of GPRPairs.

def GPRPair : RISCVRegisterClass<[XLenPairVT, XLenPairFVT], 64, (add
    X10_X11, X12_X13, X14_X15, X16_X17,
    X6_X7,
    X28_X29, X30_X31,
    X8_X9,
    X18_X19, X20_X21, X22_X23, X24_X25, X26_X27,
    X0_Pair, X2_X3, X4_X5
)>;

I hope this is starting to come together. RISCVMoveMerger can be additionally used to find two move instructions such that they can be merged into one instruction.

mv a0, s0 
mv a1, s1

// or 
mv a1, s1
mv a0, s0 

// This is the equivalent for zdinx.
fmv.d a0, s0

// This is the equivalent for experimental-p. Note that a third operand is 
// used but this just adds zero to s0, so it's still a move.
addd a0, s0, zero

Honestly at this point, I basically recounted the major points of my work. Reading further will go into some more minute details which I found interesting.

Current structure & New Functions

This was structure of the pass, and it can be noticed that the current function signatures are tied with finding and merging cm.mv* instructions. For example, MoveFromSToA is a helpful boolean parameter useful for cm.mv* instructions, but useless for GPRPairs.

  bool isCandidateToMergeMVA01S(const DestSourcePair &RegPair);
  bool isCandidateToMergeMVSA01(const DestSourcePair &RegPair);
  // Merge the two instructions indicated into a single pair instruction.
  MachineBasicBlock::iterator
  mergePairedInsns(MachineBasicBlock::iterator I,
                   MachineBasicBlock::iterator Paired, bool MoveFromSToA);
  // Look for C.MV instruction that can be combined with
  // the given instruction into CM.MVA01S or CM.MVSA01. Return the matching
  // instruction if one exists.
  MachineBasicBlock::iterator
  findMatchingInst(MachineBasicBlock::iterator &MBBI, bool MoveFromSToA,
                   const DestSourcePair &RegPair);
  bool mergeMoveSARegPair(MachineBasicBlock &MBB);
  bool runOnMachineFunction(MachineFunction &Fn) override;

The first two isCandidate* bool helpers are used to indicate whether an instruction can be merged into a possible cm.mv instruction. mergePairedInsns is used when two potential instructions are found and the merged. findMatchingInst is used when the first instruction is already found, and it looks for the second pair. mergeMoveSARegPair is used to iterate through instructions in a basic block to initially find the instructions, and call the find and merge functions.

mergeMoveSARegPair can definitely be reused, since it’s goal is to iterate through instructions. However, findMatchingInst and mergeMoveSARegPair are inherently tied to the cm.mv instructions. For GPRPair, there needs to be very similar functions but with a different boolean parameter, EvenRegPair, to indicate if a GPRPair is even or odd, since the logic to make the pair would be slightly different if the pair is even or odd. The snippet below from mergeRegPairInsns may illuminate what I’m trying to convey.

  unsigned GPRPairIdx =
      RegPairIsEven ? RISCV::sub_gpr_even : RISCV::sub_gpr_odd;
  SrcReg1 = TRI->getMatchingSuperReg(FirstPair.Source->getReg(), GPRPairIdx,
                                     &RISCV::GPRPairRegClass);
  SrcReg2 = ST->hasStdExtZdinx() ? SrcReg1 : Register(RISCV::X0_Pair);
  DestReg = TRI->getMatchingSuperReg(FirstPair.Destination->getReg(),
                                     GPRPairIdx, &RISCV::GPRPairRegClass);

Some DFA

Dataflow analysis allows compilers to determine how data flows through the program. It allows compilers to make effective and safe decisions about optimizations. It’s essential to understand DFA to work with middle-end and back-end optimizations, and we get to see DFA in action at least within a basic block in this pass. Keep in mind that I did not write the code you’ll see in this section.

Lets think up of a scenario where we found the first potential instruction pair, and the findMatchingInst/Pair is called to iterate through the rest of the basic block to find the second.

mv a0, s0 
ld a2, a1
mv a1, s1

We can see from the snippet above that there could be a potential match here with a0, a1 and s0, s1. However, notice that before the second mv instruction is a ld instruction. This instruction loads the value of a1 into a2. If we were to merge the two move instructions, a1 would get overwritten and the data loaded into a2 would be invalid!

// Say we did merge it so now it's mv a0_a1, s0_s1
mv a0, s0 
// a2 would get the value of s1 instead of the previous (correct) value of a1!
ld a2, a1

The if statement with early return has a pretty good comment explaining it’s purpose and it solves this problem. If there are any instructions between the first instruction, and the second instruction which modifies or uses the destination register of the second instruction, return early.

  for (MachineBasicBlock::iterator I = next_nodbg(MBBI, E); I != E;
       I = next_nodbg(I, E)) {

    MachineInstr &MI = *I;

    if (auto SecondPair = TII->isCopyInstrImpl(MI)) {
      Register SourceReg = SecondPair->Source->getReg();
      Register DestReg = SecondPair->Destination->getReg();

      bool IsCandidate = MoveFromSToA ? isCandidateToMergeMVA01S(*SecondPair)
                                      : isCandidateToMergeMVSA01(*SecondPair);
      if (IsCandidate) {

        ...

        // If paired destination register was modified or used, the source reg
        // was modified, there is no possibility of finding matching
        // instruction so exit early.
        if (!ModifiedRegUnits.available(DestReg) ||
            !UsedRegUnits.available(DestReg) ||
            !ModifiedRegUnits.available(SourceReg))
          return E;

        return I;
      }
    }
    // Update modified / used register units.
    LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits, TRI);
  }

Another DFA we can observe is ensuring the liveness or kill status of registers when moving the second register pair to the first pair’s location. The snippet below demonstrates the problem if the kill status is not considered. Note that the snippet below is an mir file, a lower-level intermediate representation right before the final assembly.

$x10 = COPY $x12               ; (I) The "Target" location for the merge
$x5  = ADD $x13, $x3           ; (It) Middle instruction READS $x13!
$x11 = COPY killed $x13        ; (Paired) The move we want to bring up to Line 1

The third instruction, a move/copy instruction, can be merged with the first instruction. But note that the there is a keyword, killed $x13. It indicates that this register, $x13 is not used after that instruction. If the compiler just moves this to the location of the first instruction and merges it, this would cause an error because it’s still killed but the second instruction reads the value from $x13.

// PairSource is a copy of the second pair. 
// This for loop iterates from the first istruction to the second instruction.
// It checks if any other register reads the register of the second instruction.
// If it does, it sets the kill status to false.
for (auto It = std::next(I); It != Paired && PairedSource.isKill(); ++It)
  if (It->readsRegister(PairedSource.getReg(), TRI))
    PairedSource.setIsKill(false);

Conclusion

Overall, this was a pretty fun PR. It was pretty exciting to work on a PR that was not labelled a ‘Good First Issue’. Some things I want to keep pursuing are issues that in some manner require DFA.

My First LLVM PR