The problem
Non-life insurers need to project claim experience forward in time to set technical prices, estimate reserves, and compute regulatory capital (Solvency II SCR). A straightforward point estimate of expected claim frequency λ is enough for single-year pricing, but it breaks down for multi-year projections.
The reason: experience features — like PriorClaims3Y, a rolling three-year claim count — change each year as simulated claims accumulate. Because λ depends on those features, it must be recomputed annually. There is no closed-form solution; Monte Carlo simulation is the only option.
This project answers two questions:
- Inference: for single-year batch scoring, is ONNX Runtime faster than native LightGBM?
- Simulation: how fast can the multi-year recursive simulation scale with portfolio size and sim count?
Architecture
The model
A LightGBM Poisson GLM trained on
freMTPL2freq
— 678,013 French Motor Third Party Liability policies.
The model predicts the expected annual claim frequency λ per policy.
log(Exposure) is used as an offset so partial-year policies are handled correctly.
| Feature | Description |
|---|---|
| VehPower, VehAge | Vehicle engine power and age |
| DrivAge | Driver age in years |
| Density | Population density of driver's municipality |
| PriorClaims3Y | Rolling 3-year claim count — updated each simulation year |
| Area, VehBrand, VehGas, Region | Categorical features (label-encoded) |
BonusMalus is excluded: it cannot be projected forward without a separate bonus-malus transition model. PriorClaims3Y serves as a lightweight, simulatable experience feature in its place.
Rust simulation engine
Each Rayon worker thread owns one ONNX session loaded lazily on first use. Once warm, threads run with no lock contention:
The ONNX model outputs λ in the original scale — onnxmltools preserves
LightGBM's internal exp() transform. Expected claims per policy
are therefore μ = λ × exposure, not exp(log_λ + log_exposure).
Study 1 — Inference: LightGBM vs ONNX Runtime
Both engines receive the same [N, 9] float32 feature matrix and return
λ per policy. Each measurement is the minimum of 3 repetitions to suppress OS scheduling
jitter; a warmup call is made first to exclude library-init overhead.
Observed results — AWS c6i.4xlarge (16 cores)
| Policies | LightGBM | ONNX Runtime | Speedup |
|---|---|---|---|
| 169,503 (25%) | 399 ms | 482 ms | 0.83× |
Implication: portfolio sharding
If ONNX is the inference backend (as it must be for the Rust engine), splitting a large portfolio into smaller shards keeps each ONNX call in the batch-size range where it is competitive — and delivers parallelism for free. Running 4 × 25% shards concurrently is strictly better than 1 × 100% sequentially.
Study 2 — Simulation: Rust + ONNX scaling
Observed results — AWS c6i.4xlarge (16 cores, 169 K policies)
| Policies | Years | Sims | ms / sim | Wall time |
|---|---|---|---|---|
| 169,503 (25%) | 1 | 2,000 | 2,980 | ~99 min |
| 169,503 (25%) | 5 | 2,000 | 9,879 | ~5.5 hours |
ONNX inference on EC2 is approximately 12× faster than on a 2019 Intel Mac (~8.6 µs/policy vs ~100 µs/policy), consistent with the AVX-512 advantage on AWS Ice Lake instances.
Simulated claim frequency over time
Output from the macOS calibration run (10 K policies, 500 sims, 5 years). Year 0 uses partial-year exposure from the portfolio; years 1–4 use full-year exposure.
The session-loading constraint
ONNX Runtime holds a global lock during session initialisation. Even with Rayon's work-stealing scheduler, sessions load sequentially:
This means adding more threads eventually hurts: startup cost grows faster than compute shrinks. The optimal thread count balances the two:
On EC2, with T_inference(678K) ≈ 5.8 s and T_session ≈ 1–2 s, k* is in the range of 60–100 threads — well above the 16 cores on a c6i.4xlarge, meaning linear scaling holds across the full instance for production workloads.
How many simulations are enough?
| Use case | Target statistic | Recommended sims |
|---|---|---|
| Pricing / expected loss | Mean frequency | 500–1,000 |
| Reserving, confidence intervals | P95 | 1,000–2,000 |
| Capital / risk margin | P99 | 2,000–5,000 |
| Solvency II / regulatory capital | P99.5 (SCR) | 5,000–10,000 |
Default recommendation: 2,500 sims. This gives reliable P99 estimates (≈ 0.2% SE in probability space) at a reasonable compute cost for most pricing and reserving tasks.
Cost estimates (AWS on-demand)
Extrapolating from the 25%-portfolio result (linear in N), a full 678K-policy, 5-year, 2,000-sim run on one c6i.4xlarge takes roughly 22 hours. A sharding strategy across four instances reduces wall time to ~5.5 hours per shard.
Negligible at company scale — the bottleneck is actuary time, not compute cost.
Explore the project
| Resource | Description |
|---|---|
| GitHub repository | Full source: Python pipeline, Rust engine, Terraform |
| BENCHMARK.md | Study design, full results tables, capacity planning formulas |
| README.md | Setup guide, pipeline steps, Rust CLI reference |
| freMTPL2freq dataset | 678,013 French MTPL policies on OpenML |