Python to Rust to Proof: Cross-Validating a ZK System
Three implementations of the same physics. Python float. Python scaled-integer. Rust BigInt. They must agree bit-for-bit.
Three implementations of the same physics. Python floating-point. Python scaled-integer. Rust BigInt. They must agree bit-for-bit, because the ZK proof is only as correct as the reference implementation it was validated against.
Here's how we cross-validate a ZK system across languages and number representations.
The Three Implementations
Our zk-physics project maintains three implementations of the Rayleigh-Plesset simulation:
1. Python Float (reference/simulate.py)
The ground truth for physics. Uses NumPy float64 — the same representation a physicist would use. Human-readable, easy to modify, trusted.
@dataclass
class SimulationParams:
R0: float # Equilibrium radius (m)
P0: float # Ambient pressure (Pa)
Pa: float # Acoustic amplitude (Pa)
freq: float # Frequency (Hz)
gamma: float # Polytropic exponent
sigma: float # Surface tension (N/m)
mu: float # Viscosity (Pa*s)
rho: float # Density (kg/m³)
T0: float # Initial temperature (K)
dt: float # Timestep (s)
This implementation answers the question: "given these physical parameters, what does the simulation produce?"
2. Python Scaled-Integer (reference/simulate_scaled.py)
The bridge between float physics and field arithmetic. Uses Python's arbitrary-precision integers with SCALE = 10**30:
SCALE = 10**30
def smul(a, b):
return (a * b) // SCALE
def sdiv(a, b):
return (a * SCALE) // b
Same physics, different arithmetic. This implementation answers: "does the scaled-integer representation produce the same results as floating-point?"
3. Rust BigInt (src/witness.rs)
The actual witness generator. Uses num-bigint for arbitrary-precision integers:
fn smul(a: &BigInt, b: &BigInt) -> BigInt {
(a * b) / scale()
}
fn sdiv(a: &BigInt, b: &BigInt) -> BigInt {
(a * scale()) / b
}
This implementation answers: "does the Rust witness match the Python reference, and do its values satisfy the halo2 circuit constraints?"
The Shared Loop: Deduplication as Validation
A key architectural decision: the Rust witness uses a single simulate_inner() function for both the collapse preset and the acoustic driving preset:
fn simulate_inner(
r0: &BigInt, p0: &BigInt, p_initial: &BigInt,
dt: &BigInt, dt2: &BigInt, two_sigma: &BigInt,
four_s: &BigInt, three_halves_s: &BigInt,
rho: &BigInt, mu: &BigInt,
r_start: BigInt, n_steps: usize,
acoustic_dp: impl Fn(usize) -> BigInt,
) -> Vec<BigInt> {
// Single Verlet integration loop
// acoustic_dp returns zero for collapse, Pa*sin(wt) for acoustic
}
Before our code review, collapse and acoustic had separate simulation loops. This is a correctness hazard: fix a bug in one loop, forget the other. Deduplication means every physics improvement applies everywhere.
Cross-Validation: The Test Suite
The Python test suite (reference/test_simulate.py) runs both Python implementations and compares:
def test_collapse_float_vs_scaled():
"""Float and scaled-integer must agree within tolerance."""
params_float = CollapsePreset.float_params()
params_scaled = CollapsePreset.scaled_params()
trace_float = simulate_float(params_float, n_steps=100)
trace_scaled = simulate_scaled(params_scaled, n_steps=100)
for i in range(len(trace_float)):
r_float = trace_float[i]
r_scaled = trace_scaled[i] / SCALE
assert abs(r_float - r_scaled) / r_float < 1e-10, \
f"Step {i}: float={r_float}, scaled={r_scaled}"
The tolerance (10^-10 relative error) accounts for the precision difference between float64 (53-bit mantissa) and exact BigInt arithmetic. Over 100 steps, errors accumulate but stay well within this bound.
The Rust-to-Python comparison uses exported JSON traces:
def test_rust_vs_python():
"""Rust BigInt witness must match Python scaled-integer exactly."""
rust_trace = json.load(open("output/rust_trace.json"))
python_trace = simulate_scaled(params, n_steps=len(rust_trace)-1)
for i in range(len(rust_trace)):
assert rust_trace[i] == python_trace[i], \
f"Step {i}: rust={rust_trace[i]}, python={python_trace[i]}"
Note: this comparison is exact (not approximate). Rust BigInt and Python int both do arbitrary-precision integer arithmetic. The same operations on the same inputs must produce identical results.
The Field Replay: Closing the Gap
Even with exact Rust-Python agreement, there's one more gap: the circuit uses field arithmetic (modular inverse), not integer arithmetic (truncating division). These differ whenever a product isn't exactly divisible by SCALE.
The compute_public_inputs<Fr>() function closes this gap by replaying the entire simulation using field operations:
pub fn compute_public_inputs<F: PrimeField>(witness: &SimulationWitness) -> Vec<F> {
let s = F::from_u128(SCALE);
let s_inv = s.invert().unwrap();
// Load initial conditions as field elements
let mut r_curr = F::from_u128(witness.r0);
// Replay each step using field mul/div (not integer)
// Extract final temperature and emission count
// Return [r0, final_temp, total_emission]
}
This ensures the public inputs — the values the verifier checks — are computed the same way the circuit computes them. Integer arithmetic provides the witness; field arithmetic provides the verification target.
Validation Pyramid
The full validation chain looks like this:
Python float (ground truth physics)
↕ agree within 10^-10 relative error
Python scaled-integer (exact BigInt)
↕ agree exactly (same arithmetic)
Rust BigInt witness
↕ satisfies constraints
halo2 circuit (field arithmetic)
↕ matches
compute_public_inputs (field replay)
↕ verified by
PLONK verifier (pairing check)
Each layer validates the one below it. If any link breaks, we know exactly where the disagreement is:
- Python float ≠ Python scaled: scaling error or precision loss
- Python scaled ≠ Rust BigInt: implementation bug in witness generator
- Rust witness ≠ circuit constraints: witness computation doesn't match gates
- Public inputs ≠ circuit output: field vs. integer arithmetic mismatch
- Verifier rejects: proof generation bug or transcript error
Lessons Learned
1. Three implementations is not redundant. Each catches different classes of bugs. The float version catches physics errors. The scaled-integer version catches precision errors. The Rust version catches implementation errors.
2. Exact comparison where possible, bounded comparison where necessary. Float-to-scaled comparisons need a tolerance. Scaled-to-BigInt comparisons should be exact. Mixing these up wastes debugging time.
3. Deduplication is a correctness strategy. The simulate_inner() refactor wasn't about code cleanliness — it was about ensuring both presets use the same physics. Every shared line of code is one fewer place for divergence to hide.
4. The field replay is non-optional. You cannot skip compute_public_inputs. Integer arithmetic and field arithmetic give different results. If your public inputs come from integer computation, your proofs will intermittently fail for no apparent reason.
This is Part 7 of our 8-part series. Part 6: Testing ZK Privacy covers the privacy testing framework. Final article: Part 8: Sonoluminescence — The Physics of Light from Nothing — the full science behind the simulation.
New here? Subscribe free for the full series.