Python to Rust to Proof: Cross-Validating a ZK System

Three implementations of the same physics. Python float. Python scaled-integer. Rust BigInt. They must agree bit-for-bit.

Three implementations of the same physics. Python floating-point. Python scaled-integer. Rust BigInt. They must agree bit-for-bit, because the ZK proof is only as correct as the reference implementation it was validated against.

Here's how we cross-validate a ZK system across languages and number representations.

The Three Implementations

Our zk-physics project maintains three implementations of the Rayleigh-Plesset simulation:

1. Python Float (reference/simulate.py)

The ground truth for physics. Uses NumPy float64 — the same representation a physicist would use. Human-readable, easy to modify, trusted.

@dataclass
class SimulationParams:
    R0: float       # Equilibrium radius (m)
    P0: float       # Ambient pressure (Pa)
    Pa: float       # Acoustic amplitude (Pa)
    freq: float     # Frequency (Hz)
    gamma: float    # Polytropic exponent
    sigma: float    # Surface tension (N/m)
    mu: float       # Viscosity (Pa*s)
    rho: float      # Density (kg/m³)
    T0: float       # Initial temperature (K)
    dt: float       # Timestep (s)

This implementation answers the question: "given these physical parameters, what does the simulation produce?"

2. Python Scaled-Integer (reference/simulate_scaled.py)

The bridge between float physics and field arithmetic. Uses Python's arbitrary-precision integers with SCALE = 10**30:

SCALE = 10**30

def smul(a, b):
    return (a * b) // SCALE

def sdiv(a, b):
    return (a * SCALE) // b

Same physics, different arithmetic. This implementation answers: "does the scaled-integer representation produce the same results as floating-point?"

3. Rust BigInt (src/witness.rs)

The actual witness generator. Uses num-bigint for arbitrary-precision integers:

fn smul(a: &BigInt, b: &BigInt) -> BigInt {
    (a * b) / scale()
}

fn sdiv(a: &BigInt, b: &BigInt) -> BigInt {
    (a * scale()) / b
}

This implementation answers: "does the Rust witness match the Python reference, and do its values satisfy the halo2 circuit constraints?"

The Shared Loop: Deduplication as Validation

A key architectural decision: the Rust witness uses a single simulate_inner() function for both the collapse preset and the acoustic driving preset:

fn simulate_inner(
    r0: &BigInt, p0: &BigInt, p_initial: &BigInt,
    dt: &BigInt, dt2: &BigInt, two_sigma: &BigInt,
    four_s: &BigInt, three_halves_s: &BigInt,
    rho: &BigInt, mu: &BigInt,
    r_start: BigInt, n_steps: usize,
    acoustic_dp: impl Fn(usize) -> BigInt,
) -> Vec<BigInt> {
    // Single Verlet integration loop
    // acoustic_dp returns zero for collapse, Pa*sin(wt) for acoustic
}

Before our code review, collapse and acoustic had separate simulation loops. This is a correctness hazard: fix a bug in one loop, forget the other. Deduplication means every physics improvement applies everywhere.

Cross-Validation: The Test Suite

The Python test suite (reference/test_simulate.py) runs both Python implementations and compares:

def test_collapse_float_vs_scaled():
    """Float and scaled-integer must agree within tolerance."""
    params_float = CollapsePreset.float_params()
    params_scaled = CollapsePreset.scaled_params()

    trace_float = simulate_float(params_float, n_steps=100)
    trace_scaled = simulate_scaled(params_scaled, n_steps=100)

    for i in range(len(trace_float)):
        r_float = trace_float[i]
        r_scaled = trace_scaled[i] / SCALE
        assert abs(r_float - r_scaled) / r_float < 1e-10, \
            f"Step {i}: float={r_float}, scaled={r_scaled}"

The tolerance (10^-10 relative error) accounts for the precision difference between float64 (53-bit mantissa) and exact BigInt arithmetic. Over 100 steps, errors accumulate but stay well within this bound.

The Rust-to-Python comparison uses exported JSON traces:

def test_rust_vs_python():
    """Rust BigInt witness must match Python scaled-integer exactly."""
    rust_trace = json.load(open("output/rust_trace.json"))
    python_trace = simulate_scaled(params, n_steps=len(rust_trace)-1)

    for i in range(len(rust_trace)):
        assert rust_trace[i] == python_trace[i], \
            f"Step {i}: rust={rust_trace[i]}, python={python_trace[i]}"

Note: this comparison is exact (not approximate). Rust BigInt and Python int both do arbitrary-precision integer arithmetic. The same operations on the same inputs must produce identical results.

The Field Replay: Closing the Gap

Even with exact Rust-Python agreement, there's one more gap: the circuit uses field arithmetic (modular inverse), not integer arithmetic (truncating division). These differ whenever a product isn't exactly divisible by SCALE.

The compute_public_inputs<Fr>() function closes this gap by replaying the entire simulation using field operations:

pub fn compute_public_inputs<F: PrimeField>(witness: &SimulationWitness) -> Vec<F> {
    let s = F::from_u128(SCALE);
    let s_inv = s.invert().unwrap();
    // Load initial conditions as field elements
    let mut r_curr = F::from_u128(witness.r0);
    // Replay each step using field mul/div (not integer)
    // Extract final temperature and emission count
    // Return [r0, final_temp, total_emission]
}

This ensures the public inputs — the values the verifier checks — are computed the same way the circuit computes them. Integer arithmetic provides the witness; field arithmetic provides the verification target.

Validation Pyramid

The full validation chain looks like this:

Python float (ground truth physics)
    ↕ agree within 10^-10 relative error
Python scaled-integer (exact BigInt)
    ↕ agree exactly (same arithmetic)
Rust BigInt witness
    ↕ satisfies constraints
halo2 circuit (field arithmetic)
    ↕ matches
compute_public_inputs (field replay)
    ↕ verified by
PLONK verifier (pairing check)

Each layer validates the one below it. If any link breaks, we know exactly where the disagreement is:

  • Python float ≠ Python scaled: scaling error or precision loss
  • Python scaled ≠ Rust BigInt: implementation bug in witness generator
  • Rust witness ≠ circuit constraints: witness computation doesn't match gates
  • Public inputs ≠ circuit output: field vs. integer arithmetic mismatch
  • Verifier rejects: proof generation bug or transcript error

Lessons Learned

1. Three implementations is not redundant. Each catches different classes of bugs. The float version catches physics errors. The scaled-integer version catches precision errors. The Rust version catches implementation errors.

2. Exact comparison where possible, bounded comparison where necessary. Float-to-scaled comparisons need a tolerance. Scaled-to-BigInt comparisons should be exact. Mixing these up wastes debugging time.

3. Deduplication is a correctness strategy. The simulate_inner() refactor wasn't about code cleanliness — it was about ensuring both presets use the same physics. Every shared line of code is one fewer place for divergence to hide.

4. The field replay is non-optional. You cannot skip compute_public_inputs. Integer arithmetic and field arithmetic give different results. If your public inputs come from integer computation, your proofs will intermittently fail for no apparent reason.


This is Part 7 of our 8-part series. Part 6: Testing ZK Privacy covers the privacy testing framework. Final article: Part 8: Sonoluminescence — The Physics of Light from Nothing — the full science behind the simulation.

New here? Subscribe free for the full series.

Subscribe to Jacobian

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe