Skip to content

Add solve fuzzer and fix the soundness bugs it surfaced#9105

Draft
alexreinking wants to merge 3 commits intomainfrom
alexreinking/solve-fuzzer
Draft

Add solve fuzzer and fix the soundness bugs it surfaced#9105
alexreinking wants to merge 3 commits intomainfrom
alexreinking/solve-fuzzer

Conversation

@alexreinking
Copy link
Copy Markdown
Member

@alexreinking alexreinking commented Apr 23, 2026

Summary

  • New libfuzzer harness at test/fuzz/solve.cpp for solve_expression, solve_for_{inner,outer}_interval, and and_condition_over_domain, running over random expression trees up to depth 6 across Int(8..64), UInt(1..64), Float(32), Float(64), with broadcasts and ramps enabled.
  • Fixes every soundness bug the fuzzer surfaced along the way — five in Solve.cpp, four in Bounds.cpp, and one in Simplify_Cast.cpp — each with a companion regression test in test/correctness/solve.cpp that reproduces the failure without the fix.
  • Clean at 250k+ fuzz iterations on the full type set after the fixes.

Solve.cpp

  • visit_cmp a + b @ c -> a @ c - b flipped ordering under modular wrap; restrict to no_overflow_int, keep EQ/NE.
  • visit_cmp Mul rewrites introduced div-by-zero UB on non-constant multipliers; move EQ/NE cases under the positive-const guard and gate on no_overflow_int.
  • visit(Add) a + a -> a * 2 aborted on UInt(1); switch to Mul::make(a, make_const(a.type(), 2)).
  • visit(Add)/visit(Sub) f(x)/a + g(x) -> (f(x) + g(x)*a)/a wrapped on narrow types; gate on no_overflow_int.
  • Canonicalize two inlined copies of is_int() && bits() >= 32 to no_overflow_int.

Bounds.cpp

  • visit(Cast) shortcut silently wrapped unsigned narrowings to signed (UInt(64) -> Int(32) carried a wrapped-negative as max); restrict to non-unsigned sources.
  • Bounded signed-int sources could still wrap and invert the interval (e.g. int64 [59, 4294967287] -> int32 [59, -9]); further restrict the shortcut to unbounded signed-int sources and let bounded ones fall through to the existing can_prove(cast(from, cast(to, x)) == x) fit check.
  • Float sources hit the same flavor of bug (two float endpoints wrap to opposite sides of zero, inverting the int interval); drop the float shortcut entirely — the bounded-a path already does a sound to.can_represent(*fmin) fit check.
  • visit(Mod) claimed [0, ...] even for floats; fmod takes the sign of the dividend and is NaN on zero divisor, so return bounds_of_type(t) for float types.

Simplify_Cast.cpp

  • Nested-cast rule stripped the inner cast on the "sign/zero extend then truncate" theory, but that only holds when the inner source is int-or-uint. For a float source (e.g. int32(uint64(float64(-21)))) stripping changes the value. Add the is_int_or_uint() guard.

Test plan

  • ctest -R 'correctness_(solve|bounds|simplify)$' passes
  • test/fuzz/fuzz_solve -runs=10000 across multiple seeds with floats enabled runs clean
  • Reverting each fix individually reproduces the corresponding regression test failure

Fixes #9101

🤖 Generated with Claude Code

Adds a libfuzzer harness at test/fuzz/solve.cpp that exercises
solve_expression (semantic equivalence under random substitutions),
solve_for_inner_interval / solve_for_outer_interval (inside-inner
implies true; outside-outer implies false), and
and_condition_over_domain (weakened condition implies input) on
random expression trees up to depth 6 across Int(8..64), UInt(1..64),
Float(32), Float(64), with broadcasts and ramps enabled. The
random_expr_generator now skips bitwise ops on float types, which
would otherwise abort the fuzzer on construction.

Alongside the fuzzer, this fixes every soundness bug the fuzzer
surfaced while building it up to the wide type set:

Solve.cpp (src/Solve.cpp):
  - visit_cmp rewrote `a + b @ c` to `a @ c - b` for every
    comparison, but the subtraction wrap flips ordering (LT/LE/GT/GE)
    under modular arithmetic. Restrict ordering rewrites to
    no_overflow_int; EQ/NE stay (modular equivalence preserves
    equality).
  - visit_cmp's Mul rewrites `a*c @ b <=> a @ b/c && b%c == 0`
    computed b/c and b%c for any shape of c, introducing div-by-zero
    UB when c was a non-constant that evaluated to zero. The code
    comment claimed it only fired on constant positive multipliers;
    move the EQ/NE cases inside that guard and also require
    no_overflow_int. Narrow signed types (where overflow is still UB)
    go through the same gate.
  - visit(Add)'s `a + a -> a * 2` used `Expr * int`, which aborts on
    UInt(1) ("Integer constant 2 ... coerced to type uint1"). Switch
    to Mul::make(a, make_const(a.type(), 2)); make_const truncates
    modulo width, which is the correct modular value.
  - visit(Add)/visit(Sub) `f(x)/a + g(x) -> (f(x) + g(x)*a)/a` only
    holds under non-wrapping arithmetic. Gate on no_overflow_int.
  - Canonicalize two inlined `a.type().is_int() && bits() >= 32`
    predicates to no_overflow_int.

Bounds.cpp (src/Bounds.cpp):
  - visit(Cast) took a blanket "no overflow possible" shortcut for
    signed-int/at-least-int32 destinations regardless of the source,
    which silently wrapped UInt(32)/UInt(64) -> Int(32) narrowings
    and carried a wrapped-negative as "max". Restrict the shortcut
    to non-unsigned sources.
  - After the first restriction, the fuzzer also found that *bounded*
    signed-int sources can exceed the destination and wrap (e.g.
    int64 bounds [59, 4294967287] becoming int32 [59, -9], an
    inverted interval that downstream treats as "always true").
    Further restrict the shortcut to unbounded signed-int sources;
    let bounded ones fall through to the existing
    can_prove(cast(from, cast(to, x)) == x) fit check.
  - Later, the fuzzer (with floats enabled) caught the same flavor
    of bug for float sources: the comment claimed "float-to-int
    casts truncate in place and preserve interval orientation", but
    IntImm::make wraps by sign-extending low bits, and two float
    endpoints can wrap to opposite sides of zero (inverting the
    interval). Drop the float shortcut entirely; the bounded-a path
    already does a sound to.can_represent(*fmin) fit check with a
    fallback to bounds_of_type(to).
  - visit(Mod) unconditionally claimed the result is in [0, ...],
    which matches integer mod but not fmod (sign of the dividend;
    NaN on zero divisor). For float types, return
    bounds_of_type(t) (unbounded).

Simplify_Cast.cpp (src/Simplify_Cast.cpp):
  - A nested-cast simplification stripped the inner cast when the
    outer was narrower, on the grounds that the inner is always a
    sign or zero extend. That assumption only holds when the inner
    cast's source is itself int-or-uint; for a float source, the
    inner is an fp-to-int conversion and stripping it changes the
    value. Concretely, `int32(uint64(float64(-21)))` evaluates to 0
    (fp-to-unsigned of negative is implementation-defined, often 0),
    but the stripped `int32(float64(-21))` evaluates to -21. Add
    the `cast->value.type().is_int_or_uint()` guard, mirroring the
    sibling rule above it.

Each of the solver and bounds fixes has a companion regression test
in test/correctness/solve.cpp that, prior to the fix, reproduces the
specific failure mode (verified by reverting each fix individually).

With all fixes in place, the fuzzer runs clean for 250k+ iterations
across the full type set; the correctness suite (solve, bounds,
simplify, cast) still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexreinking alexreinking marked this pull request as draft April 23, 2026 16:15
The previous commit dropped the float-source shortcut from visit(Cast)
to fix the fuzzer-found unsoundness where constant float bounds outside
int range wrapped via IntImm::make and inverted the resulting int
interval. That was too broad: bilateral_grid (and similar pipelines)
produce SYMBOLIC float bounds like
  select(r_sigma > 0, 0/r_sigma, 1/r_sigma) + 0.5
at bounds-inference time, which can't be proven to fit in int range but
are load-bearing for downstream BoundsInference to resolve Func call
extents -- without them, "histogram calls histogram in an unbounded way
in dimension 2" and the build fails.

Restore the float-source shortcut for bounded-a, but reject it only
when an endpoint is a float *constant* we can prove exceeds the int
range. Symbolic bounds pass through (matching Halide's convention that
out-of-range float-to-int is the programmer's problem); constant bounds
that would wrap do not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Bounds.cpp Outdated
could_overflow = false;
} else if (from.is_float() && to.is_int() && to.bits() >= 32) {
// Float source to int32+: out-of-range float-to-signed-int
// casts are implementation-defined, so Halide's convention
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly misleading. Float to signed int casts in Halide saturate for bits <= 16.

Comment thread src/Bounds.cpp Outdated

// Mod is always positive
if (t.is_float()) {
// fmod's result takes the sign of the dividend (so it can be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mod in Halide is supposed to be positive. For integers we have the Euclidean identity:

(a / b) * b + (a % b) == a

The division rounds up when b is negative, so the LHS of the mul is larger than the true answer, which means, when multiplied by the negative b, the LHS of the add is smaller than the true answer, which means mod must be positive.

I think the bug might be in lowering of floating point mod. It's currently lowered as:

a - floor(a / b) * b

but it should be

a - floor((a/b) * b)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, my rationale is this: Floating point mod should coincide with integer mod if the floats happen to be integers. This means if 7 % -2 == 1 in the integers 7.f % -2.f should be 1.f in floats.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Floating point mod should coincide with integer mod if the floats happen to be integers.

I would be wary of deviating from what the hardware does (fmod) for this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already deviated for integer div. We should be consistent.

Comment thread src/Simplify_Cast.cpp Outdated
// a float source makes `cast` an fp-to-int conversion, whose low
// bits are not the same as an fp-to-int conversion of a narrower
// type. For example, int32(uint64(float64(-21))) evaluates to 0
// (undefined-behavior territory for negative float-to-unsigned)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think casting a negative float to uint is UB in Halide. It's supposed to saturate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though again, my understanding may not match the current lowering!

Comment thread src/Bounds.cpp Outdated
// for the fit check below to prove, and the destination is
// wide enough that any concretization will fit.
//
// Unsigned sources are excluded unconditionally: unsigned
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true either. A uint to int cast is assumed to not overflow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #7814 for what happened when I tried to claim they wrap.

Comment thread src/Solve.cpp Outdated
// Several of these rewrites rearrange the comparison by adding
// or subtracting on both sides. That's only sound under an
// assumption of no integer overflow -- for types that may wrap
// (unsigned, or narrow signed whose overflow is UB and could
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"narrow signed whose overflow is UB" Not entirely sure what narrow means here but our narrow signed ints don't have UB on overflow.

Comment thread src/Solve.cpp Outdated
// uint8 with c = 3, b = 7: the rewrite misses the
// solutions that arise from wrap).
//
// Don't use operator/ and operator % to sneak past the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original comment was stale. Division and mod by zero is now fine (returns zero). For the rewrite I think this means:

a * 0 == b <=> a == b / 0

which is wrong, so it's just the comment that needs correcting.

I'm worried that the earlier comment says we would only use this for positive or negative constants, but there's no guard to check that. It might mean that we don't statically know at this point that it's a non-zero constant, but the usage site knows that for some other reason. That's sort of horrifying so I hope it isn't the case.

Comment thread src/Solve.cpp
@@ -644,16 +672,14 @@ class SolveExpression : public IRMutator {
} else if (is_ne) {
// f(x) * c != b -> f(x) != b/c || b%c != 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see the other rules were guarded by is_positive_const, just not this one, so it's not the horrifying situation. Let's check

f(x) * 0 == b <=> f(x) == b/0 || b%0 == 0

Simplifying both sides

b == 0 <=> true

Nope. The NE case is also busted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(meaning the code change here is correct)

@@ -0,0 +1,404 @@
#include "Halide.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solver tests are currently in Solve.cpp at the bottom. It would be good to move all of them here, instead of just making a new collection.

Comment thread test/fuzz/solve.cpp Outdated
// Returns true if the expression, under the given substitution, contains a
// division or modulo whose divisor simplifies to zero. Solve is allowed to
// rearrange such expressions in ways that cause the simplifier to fold the UB
// inconsistently between the original and rewritten form -- so samples that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Division by zero is not UB!

Comment thread test/fuzz/solve.cpp Outdated
val = random_int_val(reg.fuzz, -32, 32);
}

// Skip samples that invoke div/mod-by-zero UB in the input: the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not UB

Comment thread src/Bounds.cpp Outdated
// extends the low bits and can invert the resulting
// interval, which is worse than just using bounds_of_type.
Interval s = a;
if (s.has_lower_bound()) s.min = simplify(s.min);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method elsewhere in this file that simplifies whole intervals. It contains a critical difference to this: It only calls the simplifier once if max.same_as(min). This guarantees the two don't become two different but equal Expr objects, which would cause is_single_point to fail.

Comment thread src/Bounds.cpp
double lo = -std::ldexp(1.0, to.bits() - 1);
double hi_exclusive = std::ldexp(1.0, to.bits() - 1);
bool const_oob = false;
if (s.has_lower_bound()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path may break pipelines that rely on Bounds inference "knowing" that cast<int>(abs(unbounded_float)) is non-negative. I really think we need to just throw up our hands here if the cast is to an int32 and assume it's not going to overflow. People use all sorts of types cast to int to index luts, and they assume the bounds on the original are still true through the cast.

@abadams
Copy link
Copy Markdown
Member

abadams commented Apr 23, 2026

I think, unfortunately, like other fuzzers this one is going to have to carefully avoid getting itself into a situation where an Int(32) Expr overflows.

Comment thread src/Bounds.cpp Outdated
if (t.is_float()) {
// fmod's result takes the sign of the dividend (so it can be
// negative), its magnitude is bounded by |b| but can reach it,
// and fmod(x, 0) = NaN which is not in any real interval. Don't
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're assuming no-nans here. Otherwise it would be a hypothetical "strict_mod" intrinsic.

Comment thread src/Simplify_Cast.cpp
// eliminated. The inner cast is either a sign extend
// or a zero extend, and the outer cast truncates the extended bits
// or a zero extend, and the outer cast truncates the extended bits.
// The requirement that cast->value is itself int-or-uint is crucial:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should not spend this much space arguing why a hypothetical alternative version of this would be buggy. After the first sentence it's enough to just say that everything involved needs to be an int.

Comment thread src/Bounds.cpp
Interval s = a;
if (s.has_lower_bound()) s.min = simplify(s.min);
if (s.has_upper_bound()) s.max = simplify(s.max);
double lo = -std::ldexp(1.0, to.bits() - 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I think we should consider allowing one-line if statements.

@abadams abadams added the dev_meeting Topic to be discussed at the next dev meeting label Apr 24, 2026
- Revert the Bounds.cpp uint-to-int and bounded-int-to-int shortcut
  restrictions from earlier in the branch. Halide's convention (per
  PR #7814 discussion) is that uint-to-int casts are assumed not to
  overflow; tightening the shortcut broke
  performance_boundary_conditions and bilateral_grid. Keep only the
  float-to-int precision/soundness fix from this branch.
- Drop the two correctness tests that exercised the reverted
  shortcut (narrowing uint and int casts) -- they were
  exercising programmer-level contract violations, not real compiler
  bugs.
- Teach the fuzzer to skip substitutions that violate the
  "assumed not to overflow" narrowing-cast contract, analogous to
  the existing div/mod-by-zero skip.
- Move `solve_test()` and its helpers from the bottom of src/Solve.cpp
  (previously invoked from test/internal.cpp) into
  test/correctness/solve.cpp, as abadams suggested keeping all solver
  tests together.
- Use the existing simplify(Interval) helper in the float cast
  branch so single-point intervals stay same_as-equal after
  simplify.
- Update comments that incorrectly said "UB": narrow signed wraps
  (not UB), div/mod-by-zero returns 0 (not UB), float-to-uint of
  negative saturates (not UB), float-to-int for bits <= 16 saturates.
- Reframe the float-mod bounds_of change: Halide mod is supposed
  to be Euclidean non-negative, but the current CodeGen_LLVM
  lowering `a - b * floor(a/b)` produces negative results when b
  is negative. bounds_of's unbounded fallback is a workaround
  until the lowering is fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexreinking
Copy link
Copy Markdown
Member Author

Thanks for the thorough review! Pushed a1cde0a82 with the following changes in response:

Semantics clarifications (thanks for setting me straight):

  • Reverted the Bounds.cpp visit(Cast) shortcut restrictions for uint→int and bounded int→int. I had added those based on fuzzer findings without realizing they conflict with Halide's "assumed not to overflow" convention (per Fix bounds inference for uint -> int casts #7814). Tightening them broke performance_boundary_conditions and bilateral_grid, which both rely on int32(uint32(...)) narrowing carrying bounds through. Kept only the float→int precision/soundness fix.
  • Dropped the two correctness tests that exercised the reverted shortcut — they were locking in programmer-level contract violations, not compiler bugs.
  • Taught the fuzzer to skip substitutions that violate the "assumed not to overflow" narrowing-cast contract, analogous to the existing div/mod-by-zero skip.

Comment corrections:

  • "narrow signed whose overflow is UB" → narrow signed wraps (not UB).
  • Div/mod-by-zero framing in both Solve.cpp and the fuzzer → returns 0 (not UB); the rewrite is unsound because a*0 == b becomes a == b/0 && b%0 == 0 which collapses to a == 0 && b == 0, losing the original semantics.
  • int32(uint64(float64(-21))) in Simplify_Cast.cpp → float-to-uint saturates to 0 in Halide (not UB).
  • Float-to-int implementation-defined comment tightened to note bits ≤ 16 saturates.
  • Switched to the existing simplify(Interval) helper for the new float-cast branch.

Reorganization:

  • Moved solve_test() body and its helpers from the bottom of src/Solve.cpp into test/correctness/solve.cpp, as you suggested. test/internal.cpp no longer calls it.

Float mod:

  • Kept the bounds_of unbounded fallback for float mod, but reframed the comment: Halide mod is supposed to be Euclidean non-negative, but the current CodeGen_LLVM lowering a - b * floor(a/b) produces negative results when b < 0 (e.g. fmod(5, -3) = -1 under that formula). The bounds change is a workaround until the lowering is tightened to enforce the Euclidean invariant. Happy to take a crack at the lowering in a separate PR if you'd like.

Full build + correctness + 200k fuzz iterations still clean.

— Claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev_meeting Topic to be discussed at the next dev meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We need a Solve.cpp fuzzer

2 participants