By jacobi in zk-proofs — 01 Mar 2026

Python to Rust to Proof: Cross-Validating a ZK System

Three implementations of the same physics. Python float. Python scaled-integer. Rust BigInt. They must agree bit-for-bit.

Three implementations of the same physics. Python floating-point. Python scaled-integer. Rust BigInt. They must agree bit-for-bit, because the ZK proof is only as correct as the reference implementation it was validated against.

Here's how we cross-validate a ZK system across languages and number representations.

The Three Implementations

Our zk-physics project maintains three implementations of the Rayleigh-Plesset simulation:

1. Python Float (`reference/simulate.py`)

The ground truth for physics. Uses NumPy float64 — the same representation a physicist would use. Human-readable, easy to modify, trusted.

@dataclass
class SimulationParams:
    R0: float       # Equilibrium radius (m)
    P0: float       # Ambient pressure (Pa)
    Pa: float       # Acoustic amplitude (Pa)
    freq: float     # Frequency (Hz)
    gamma: float    # Polytropic exponent
    sigma: float    # Surface tension (N/m)
    mu: float       # Viscosity (Pa*s)
    rho: float      # Density (kg/m³)
    T0: float       # Initial temperature (K)
    dt: float       # Timestep (s)

This implementation answers the question: "given these physical parameters, what does the simulation produce?"

2. Python Scaled-Integer (`reference/simulate_scaled.py`)

The bridge between float physics and field arithmetic. Uses Python's arbitrary-precision integers with SCALE = 10**30:

SCALE = 10**30

def smul(a, b):
    return (a * b) // SCALE

def sdiv(a, b):
    return (a * SCALE) // b

Same physics, different arithmetic. This implementation answers: "does the scaled-integer representation produce the same results as floating-point?"

3. Rust BigInt (`src/witness.rs`)

The actual witness generator. Uses num-bigint for arbitrary-precision integers:

fn smul(a: &BigInt, b: &BigInt) -> BigInt {
    (a * b) / scale()
}

fn sdiv(a: &BigInt, b: &BigInt) -> BigInt {
    (a * scale()) / b
}

This implementation answers: "does the Rust witness match the Python reference, and do its values satisfy the halo2 circuit constraints?"

The Shared Loop: Deduplication as Validation

A key architectural decision: the Rust witness uses a single simulate_inner() function for both the collapse preset and the acoustic driving preset:

fn simulate_inner(
    r0: &BigInt, p0: &BigInt, p_initial: &BigInt,
    dt: &BigInt, dt2: &BigInt, two_sigma: &BigInt,
    four_s: &BigInt, three_halves_s: &BigInt,
    rho: &BigInt, mu: &BigInt,
    r_start: BigInt, n_steps: usize,
    acoustic_dp: impl Fn(usize) -> BigInt,
) -> Vec<BigInt> {
    // Single Verlet integration loop
    // acoustic_dp returns zero for collapse, Pa*sin(wt) for acoustic
}

Before our code review, collapse and acoustic had separate simulation loops. This is a correctness hazard: fix a bug in one loop, forget the other. Deduplication means every physics improvement applies everywhere.

Cross-Validation: The Test Suite

The Python test suite (reference/test_simulate.py) runs both Python implementations and compares:

def test_collapse_float_vs_scaled():
    """Float and scaled-integer must agree within tolerance."""
    params_float = CollapsePreset.float_params()
    params_scaled = CollapsePreset.scaled_params()

    trace_float = simulate_float(params_float, n_steps=100)
    trace_scaled = simulate_scaled(params_scaled, n_steps=100)

    for i in range(len(trace_float)):
        r_float = trace_float[i]
        r_scaled = trace_scaled[i] / SCALE
        assert abs(r_float - r_scaled) / r_float < 1e-10, \
            f"Step {i}: float={r_float}, scaled={r_scaled}"

The tolerance (10^-10 relative error) accounts for the precision difference between float64 (53-bit mantissa) and exact BigInt arithmetic. Over 100 steps, errors accumulate but stay well within this bound.

The Rust-to-Python comparison uses exported JSON traces:

def test_rust_vs_python():
    """Rust BigInt witness must match Python scaled-integer exactly."""
    rust_trace = json.load(open("output/rust_trace.json"))
    python_trace = simulate_scaled(params, n_steps=len(rust_trace)-1)

    for i in range(len(rust_trace)):
        assert rust_trace[i] == python_trace[i], \
            f"Step {i}: rust={rust_trace[i]}, python={python_trace[i]}"

Note: this comparison is exact (not approximate). Rust BigInt and Python int both do arbitrary-precision integer arithmetic. The same operations on the same inputs must produce identical results.

The Field Replay: Closing the Gap

Even with exact Rust-Python agreement, there's one more gap: the circuit uses field arithmetic (modular inverse), not integer arithmetic (truncating division). These differ whenever a product isn't exactly divisible by SCALE.

The compute_public_inputs<Fr>() function closes this gap by replaying the entire simulation using field operations:

pub fn compute_public_inputs<F: PrimeField>(witness: &SimulationWitness) -> Vec<F> {
    let s = F::from_u128(SCALE);
    let s_inv = s.invert().unwrap();
    // Load initial conditions as field elements
    let mut r_curr = F::from_u128(witness.r0);
    // Replay each step using field mul/div (not integer)
    // Extract final temperature and emission count
    // Return [r0, final_temp, total_emission]
}

This ensures the public inputs — the values the verifier checks — are computed the same way the circuit computes them. Integer arithmetic provides the witness; field arithmetic provides the verification target.

Validation Pyramid

The full validation chain looks like this:

Python float (ground truth physics)
    ↕ agree within 10^-10 relative error
Python scaled-integer (exact BigInt)
    ↕ agree exactly (same arithmetic)
Rust BigInt witness
    ↕ satisfies constraints
halo2 circuit (field arithmetic)
    ↕ matches
compute_public_inputs (field replay)
    ↕ verified by
PLONK verifier (pairing check)

Each layer validates the one below it. If any link breaks, we know exactly where the disagreement is:

Python float ≠ Python scaled: scaling error or precision loss
Python scaled ≠ Rust BigInt: implementation bug in witness generator
Rust witness ≠ circuit constraints: witness computation doesn't match gates
Public inputs ≠ circuit output: field vs. integer arithmetic mismatch
Verifier rejects: proof generation bug or transcript error

Lessons Learned

1. Three implementations is not redundant. Each catches different classes of bugs. The float version catches physics errors. The scaled-integer version catches precision errors. The Rust version catches implementation errors.

2. Exact comparison where possible, bounded comparison where necessary. Float-to-scaled comparisons need a tolerance. Scaled-to-BigInt comparisons should be exact. Mixing these up wastes debugging time.

3. Deduplication is a correctness strategy. The simulate_inner() refactor wasn't about code cleanliness — it was about ensuring both presets use the same physics. Every shared line of code is one fewer place for divergence to hide.

4. The field replay is non-optional. You cannot skip compute_public_inputs. Integer arithmetic and field arithmetic give different results. If your public inputs come from integer computation, your proofs will intermittently fail for no apparent reason.

This is Part 7 of our 8-part series. Part 6: Testing ZK Privacy covers the privacy testing framework. Final article: Part 8: Sonoluminescence — The Physics of Light from Nothing — the full science behind the simulation.

New here? Subscribe free for the full series.

The Three Implementations

1. Python Float (reference/simulate.py)

2. Python Scaled-Integer (reference/simulate_scaled.py)

3. Rust BigInt (src/witness.rs)