Spring 2025: CS 6950 Compiler Independent Study
Last semester, I started contributing to LLVM by adding a missing InstCombine optimization and fixing several Undefined Behavior tests. As I became more familiar with LLVM, I wanted to contribute more impactful patches. Therefore, I asked John to advise me on doing an independent study on compilers this semester.
Planning
Here are the initial project options:
- LLVM Constraint Elimination missing optimization: support
eq/neand bitwiseAND/ORconstraints.- There are two open issues that I could work on: eq/ne, bitwise AND/OR
- LLVM backend AArch64 optimization
- MLIR Synthesizer
I chose to work on Constraint Elimination because I was more familiar with the development process for LLVM middle-end. The initial goal is to merge four PRs by the end of the semester.
Timeline
- 4 weeks(1/6 ~ 1/31): discuss project scope, read code and test tools
- 11 weeks(2/3 ~ 4/16): implementation and testing
- 1 week(4/16 ~ 4/23): write final report
Constraint Elimination
I wrote a blog post that introduces Constraint Elimination.
Testing and Fuzzing
With so many patches committed to LLVM every day, it’s crucial to verify that my implementation does the right thing and doesn’t introduce new bugs. During this project, I learned several methods to help verify the correctness of my implementation.
Alive2 for mathematical proof
Alive2 provides a fast and robust way to verify the correctness of LLVM transformations. Ever since it became available on Compiler Explorer, it has become an essential tool for LLVM development. Having an Alive2 proof link is a requirement for submitting InstCombine PRs.
Before implementation, it’s helpful to use Alive2 to verify the correctness of updates. This gives developers confidence in their implementation and provides reviewers with a quick way to understand the proposed transformation.
LLVM regression tests
LLVM regression tests are located in the llvm/test directory. A regression test is a small piece of LLVM IR code that demonstrates a transformation. Before changing any tests, it’s recommended to run the existing tests first to ensure there are no failures.
$ pwd
/Users/lee/dev/llvm-project/build
$ ninja -j 8 check-llvm
[841/842] Running the LLVM regression tests
Testing Time: 11154.24s
Total Discovered Tests: 63868
Skipped : 359 (0.56%)
Unsupported : 30198 (47.28%)
Passed : 33238 (52.04%)
Expectedly Failed: 73 (0.11%)
We then add our LLVM IR snippet to either an existing or a new .ll test file.
// before
define void @test_decompose_bitwise_and(i4 %x, i4 %y) {
entry:
%and.1 = and i4 %y, %x
%c.1= icmp slt i4 %and.1, 0
br i1 %c.1, label %then, label %end
then:
; fact: %and.1 < 0
%t.1 = icmp slt i4 %x, 0
%t.2 = icmp slt i4 %y, 0
call void @use(i1 %t.1)
call void @use(i1 %t.2)
ret void
end:
ret void
}
Then, we run the following command, where update_test_checks.py will apply transformations to our target file and.ll:
$ pwd
/Users/lee/dev/llvm-project/build
$ ../llvm/utils/update_test_checks.py --opt-binary=./bin/opt ../llvm/test/Transforms/ConstraintElimination/and.ll
We verify the transformation in the LLVM IR using the CHECK- prefixes and confirm the result matches Alive2’s output.
// after
define void @test_decompose_bitwise_and(i4 %x, i4 %y) {
; CHECK-LABEL: @test_decompose_bitwise_and(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = and i4 [[Y:%.*]], [[X:%.*]]
; CHECK-NEXT: [[AND:%.*]] = icmp slt i4 [[TMP0]], 0
; CHECK-NEXT: br i1 [[AND]], label [[BB1:%.*]], label [[EXIT:%.*]]
; CHECK: then:
; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: ret void
; CHECK: end:
; CHECK-NEXT: ret void
;
...
}
Our tests should cover both positive and negative cases—including simple and complex scenarios. Designing strong test cases ensures the implementation behaves as expected and avoids regressions. Also, we should run the entire LLVM regression tests after adding our tests.
YARPGen for fuzzing
YARPGen is a fuzzer that randomly generates valid C/C++ programs to test compilers. The setup is straightforward—change the clang path in test_sets.txt and run ./run_gen.py. I tested both yarpgen v1 and yarpgen v2 (branch stable) to see which one triggered my patch more often. Version 1 triggered it significantly more.
// Number of times yarpgen triggered my patch
lee@Lee ~/d/test> cat yarpgen_v1.log | grep "additional" | wc -l
182
lee@Lee ~/d/test> cat yarpgen_v2.log | grep "additional" | wc -l
0
I ran two experiment:
- I ran
yarpgen v1against my patch for several days on a 12-core machine. The goal was to ensure that no miscompilations occurred (i.e., zero compfail).
##########################
YARPGEN runs stat:
Time: 2025/03/13 08:44:06
duration: 4 d 18:27:25
testing speed: 3.72 seed/min
##########################
generator stat:
cpu time: 0 d 5:10:16
total : 25574
ok : 25574
runfail_timeout : 0
runfail : 0
##########################
clang_no_opt stat:
cpu time: 19 d 3:2:24
total : 25572
ok : 25552
compfail_timeout : 20
compfail : 0
runfail_timeout : 0
runfail : 0
different_output : 0
##########################
clang_opt stat:
cpu time: 30 d 17:48:39
total : 25562
ok : 25485
compfail_timeout : 77
compfail : 0
runfail_timeout : 0
runfail : 0
different_output : 0
=================================
- I purposely introduced a bug in my patch and reran
yarpgen v1. It successfully found a miscompilation. However, the generated program was quite large, so I usedC-Reduceto shrink it into a minimal test case.
C-Reduce to minimize program
C-Reduce reduces a large C/C++ file into a smaller test case while preserving the bug. It does this using an interestingness test (a shell script). As long as the test returns 0, the reduced program is still interesting.
For example, I considered the outputs below to be “interesting”:
# correct version
$LLVM_HOME/bin/clang++ -O3 $CXXFLAGS driver.cpp func.cpp init.h && ./a.out
9133052109159997938
# miscompilation
$LLVM_HOME/bin/clang++ -O3 $CXXFLAGS driver.cpp func.cpp init.h && ./a.out
7336395106169927387
I would like the miscompiled result to be different from the correct version, and that would be interesting for this test case. Therefore, I wrote this test1.sh to reduce the generated program. There are several details in this shell script.
- Enabling UBSan during compilation is essential, as we don’t want C-Reduce to minimize a program that exhibits undefined behavior. This requires compiling LLVM with
compiler-rtenabled. - Precompile
driver.cppand link it separately to reduce overall compile time during test case reduction. - Set
ulimitfor both virtual memory and CPU time to prevent resource exhaustion.
Here’s the test1.sh script I used with C-Reduce:
#!/bin/bash
set -x
ulimit -t 800
ulimit -v 2000000
LOGFILE="script.log"
exec > "$LOGFILE" 2>&1 # Redirect both stdout and stderr to the log file
# Compile both versions
/home/lee/dev/llvm-project/build/bin/clang++ -O3 -Xclang -disable-llvm-optzns \
-fsanitize=undefined -fno-sanitize-recover=undefined func.cpp driver.o -o no_opt || { echo "Compilation of no_opt failed!" >&2; exit 1; }
/home/lee/dev/llvm-project/build/bin/clang++ -O3 \
-fsanitize=undefined -fno-sanitize-recover=undefined func.cpp driver.o -o opt || { echo "Compilation of opt failed!" >&2; exit 1; }
# Run both versions
./no_opt || { echo "Execution of no_opt failed!" >&2; exit 1; }
./opt || { echo "Execution of opt failed!" >&2; exit 1; }
# Compare outputs
! diff <(./no_opt) <(./opt) || { echo "Outputs are the same, test case reduction is invalid!" >&2; exit 1; }
After C-Reduce, the program was small enough to inspect and reduce by hand. Here’s the final func.cpp. The incorrect patch optimized any bitwise OR compared with 0, regardless of the predicate. Here’s the Alive2 proof comparing correct and incorrect versions.
These testing and fuzzing tools gave me much more confidence in my implementation and helped prepare it for a Pull Request.
Conclusion
Here’s three PRs I submitted:
- (Closed) https://github.com/llvm/llvm-project/pull/126158
- (Merged) https://github.com/llvm/llvm-project/pull/127351
- (Under review) https://github.com/llvm/llvm-project/pull/132124
Even though I didn’t manage to submit all four PRs, I learned a great deal from the development and testing process. My first PR wasn’t implemented in the right place, and I’ve since learned the value of reaching out to LLVM developers earlier.
It was fun doing this project, and it made me learn to think like a compiler engineer. It isn’t easy to work on compilers, especially an industrial-level compiler like LLVM, but it sure is rewarding when finally submitted a PR to LLVM. I would like to show my appreciation to Dr. John Regehr for advising me on this project. Thanks, John!