Spring 2025: CS 6950 Compiler Independent Study

Last semester, I started contributing to LLVM by adding a missing InstCombine optimization and fixing several Undefined Behavior tests. As I became more familiar with LLVM, I wanted to contribute more impactful patches. Therefore, I asked John to advise me on doing an independent study on compilers this semester.

Planning

Here are the initial project options:

I chose to work on Constraint Elimination because I was more familiar with the development process for LLVM middle-end. The initial goal is to merge four PRs by the end of the semester.

Timeline

Constraint Elimination

I wrote a blog post that introduces Constraint Elimination.

Testing and Fuzzing

With so many patches committed to LLVM every day, it’s crucial to verify that my implementation does the right thing and doesn’t introduce new bugs. During this project, I learned several methods to help verify the correctness of my implementation.

Alive2 for mathematical proof

Alive2 provides a fast and robust way to verify the correctness of LLVM transformations. Ever since it became available on Compiler Explorer, it has become an essential tool for LLVM development. Having an Alive2 proof link is a requirement for submitting InstCombine PRs.

Before implementation, it’s helpful to use Alive2 to verify the correctness of updates. This gives developers confidence in their implementation and provides reviewers with a quick way to understand the proposed transformation.

LLVM regression tests

LLVM regression tests are located in the llvm/test directory. A regression test is a small piece of LLVM IR code that demonstrates a transformation. Before changing any tests, it’s recommended to run the existing tests first to ensure there are no failures.

 1$ pwd
 2/Users/lee/dev/llvm-project/build
 3$ ninja -j 8 check-llvm
 4[841/842] Running the LLVM regression tests
 5
 6Testing Time: 11154.24s
 7
 8Total Discovered Tests: 63868
 9  Skipped          :   359 (0.56%)
10  Unsupported      : 30198 (47.28%)
11  Passed           : 33238 (52.04%)
12  Expectedly Failed:    73 (0.11%)

We then add our LLVM IR snippet to either an existing or a new .ll test file.

 1// before
 2define void @test_decompose_bitwise_and(i4 %x, i4 %y) {
 3entry:
 4  %and.1 = and i4 %y, %x
 5  %c.1= icmp slt i4 %and.1, 0
 6  br i1 %c.1, label %then, label %end
 7
 8then:
 9  ; fact: %and.1 < 0
10  %t.1 = icmp slt i4 %x, 0
11  %t.2 = icmp slt i4 %y, 0
12  call void @use(i1 %t.1)
13  call void @use(i1 %t.2)
14  ret void
15
16end:
17  ret void
18}

Then, we run the following command, where update_test_checks.py will apply transformations to our target file and.ll:

1$ pwd
2/Users/lee/dev/llvm-project/build
3$ ../llvm/utils/update_test_checks.py --opt-binary=./bin/opt ../llvm/test/Transforms/ConstraintElimination/and.ll

We verify the transformation in the LLVM IR using the CHECK- prefixes and confirm the result matches Alive2’s output.

 1// after
 2define void @test_decompose_bitwise_and(i4 %x, i4 %y) {
 3; CHECK-LABEL: @test_decompose_bitwise_and(
 4; CHECK-NEXT:  entry:
 5; CHECK-NEXT:    [[TMP0:%.*]] = and i4 [[Y:%.*]], [[X:%.*]]
 6; CHECK-NEXT:    [[AND:%.*]] = icmp slt i4 [[TMP0]], 0
 7; CHECK-NEXT:    br i1 [[AND]], label [[BB1:%.*]], label [[EXIT:%.*]]
 8; CHECK:       then:
 9; CHECK-NEXT:    call void @use(i1 true)
10; CHECK-NEXT:    call void @use(i1 true)
11; CHECK-NEXT:    ret void
12; CHECK:       end:
13; CHECK-NEXT:    ret void
14;
15...
16}

Our tests should cover both positive and negative cases—including simple and complex scenarios. Designing strong test cases ensures the implementation behaves as expected and avoids regressions. Also, we should run the entire LLVM regression tests after adding our tests.

YARPGen for fuzzing

YARPGen is a fuzzer that randomly generates valid C/C++ programs to test compilers. The setup is straightforward—change the clang path in test_sets.txt and run ./run_gen.py. I tested both yarpgen v1 and yarpgen v2 (branch stable) to see which one triggered my patch more often. Version 1 triggered it significantly more.

1// Number of times yarpgen triggered my patch
2lee@Lee ~/d/test> cat yarpgen_v1.log | grep "additional" | wc -l
3182
4lee@Lee ~/d/test> cat yarpgen_v2.log | grep "additional" | wc -l
50

I ran two experiment:

  1. I ran yarpgen v1 against my patch for several days on a 12-core machine. The goal was to ensure that no miscompilations occurred (i.e., zero compfail).
 1##########################
 2YARPGEN runs stat:
 3Time: 2025/03/13 08:44:06
 4duration: 4 d 18:27:25
 5testing speed: 3.72 seed/min
 6
 7##########################
 8generator stat:
 9cpu time: 0 d 5:10:16
10        total : 25574
11        ok : 25574
12        runfail_timeout : 0
13        runfail : 0
14
15##########################
16clang_no_opt stat:
17        cpu time: 19 d 3:2:24
18        total : 25572
19        ok : 25552
20        compfail_timeout : 20
21        compfail : 0
22        runfail_timeout : 0
23        runfail : 0
24        different_output : 0
25
26##########################
27clang_opt stat:
28        cpu time: 30 d 17:48:39
29        total : 25562
30        ok : 25485
31        compfail_timeout : 77
32        compfail : 0
33        runfail_timeout : 0
34        runfail : 0
35        different_output : 0
36
37=================================
  1. I purposely introduced a bug in my patch and reran yarpgen v1. It successfully found a miscompilation. However, the generated program was quite large, so I used C-Reduce to shrink it into a minimal test case.

C-Reduce to minimize program

C-Reduce reduces a large C/C++ file into a smaller test case while preserving the bug. It does this using an interestingness test (a shell script). As long as the test returns 0, the reduced program is still interesting.

For example, I considered the outputs below to be “interesting”:

1# correct version
2$LLVM_HOME/bin/clang++ -O3 $CXXFLAGS driver.cpp func.cpp init.h && ./a.out
39133052109159997938
4
5# miscompilation
6$LLVM_HOME/bin/clang++ -O3 $CXXFLAGS driver.cpp func.cpp init.h && ./a.out
77336395106169927387

I would like the miscompiled result to be different from the correct version, and that would be interesting for this test case. Therefore, I wrote this test1.sh to reduce the generated program. There are several details in this shell script.

  1. Enabling UBSan during compilation is essential, as we don’t want C-Reduce to minimize a program that exhibits undefined behavior. This requires compiling LLVM with compiler-rt enabled.
  2. Precompile driver.cpp and link it separately to reduce overall compile time during test case reduction.
  3. Set ulimit for both virtual memory and CPU time to prevent resource exhaustion.

Here’s the test1.sh script I used with C-Reduce:

 1#!/bin/bash
 2set -x
 3ulimit -t 800
 4ulimit -v 2000000
 5
 6LOGFILE="script.log"
 7exec > "$LOGFILE" 2>&1  # Redirect both stdout and stderr to the log file
 8
 9# Compile both versions
10/home/lee/dev/llvm-project/build/bin/clang++ -O3 -Xclang -disable-llvm-optzns \
11    -fsanitize=undefined -fno-sanitize-recover=undefined func.cpp driver.o -o no_opt || { echo "Compilation of no_opt failed!" >&2; exit 1; }
12
13/home/lee/dev/llvm-project/build/bin/clang++ -O3 \
14    -fsanitize=undefined -fno-sanitize-recover=undefined func.cpp driver.o -o opt || { echo "Compilation of opt failed!" >&2; exit 1; }
15
16# Run both versions
17./no_opt || { echo "Execution of no_opt failed!" >&2; exit 1; }
18./opt || { echo "Execution of opt failed!" >&2; exit 1; }
19
20# Compare outputs
21! diff <(./no_opt) <(./opt) || { echo "Outputs are the same, test case reduction is invalid!" >&2; exit 1; }

After C-Reduce, the program was small enough to inspect and reduce by hand. Here’s the final func.cpp. The incorrect patch optimized any bitwise OR compared with 0, regardless of the predicate. Here’s the Alive2 proof comparing correct and incorrect versions.

These testing and fuzzing tools gave me much more confidence in my implementation and helped prepare it for a Pull Request.

Conclusion

Here’s three PRs I submitted:

Even though I didn’t manage to submit all four PRs, I learned a great deal from the development and testing process. My first PR wasn’t implemented in the right place, and I’ve since learned the value of reaching out to LLVM developers earlier.

It was fun doing this project, and it made me learn to think like a compiler engineer. It isn’t easy to work on compilers, especially an industrial-level compiler like LLVM, but it sure is rewarding when finally submitted a PR to LLVM. I would like to show my appreciation to Dr. John Regehr for advising me on this project. Thanks, John!