Recursive Monte Carlo Claim Simulation<br>for Non-Life Pricing

The problem

Non-life insurers need to project claim experience forward in time to set technical prices, estimate reserves, and compute regulatory capital (Solvency II SCR). A straightforward point estimate of expected claim frequency λ is enough for single-year pricing, but it breaks down for multi-year projections.

The reason: experience features — like PriorClaims3Y, a rolling three-year claim count — change each year as simulated claims accumulate. Because λ depends on those features, it must be recomputed annually. There is no closed-form solution; Monte Carlo simulation is the only option.

The feedback loop. Year-0 draws update PriorClaims3Y → year-1 lambdas differ per simulation → year-1 draws update PriorClaims3Y again → and so on for T years. Each simulation follows a distinct path through feature space.

This project answers two questions:

Inference: for single-year batch scoring, is ONNX Runtime faster than native LightGBM?
Simulation: how fast can the multi-year recursive simulation scale with portfolio size and sim count?

Architecture

Python

Train LightGBM

→

Python

Export ONNX

→

Rust · ort

Load & Infer

→

Rust · Rayon

Parallel Sims

The model

A LightGBM Poisson GLM trained on freMTPL2freq — 678,013 French Motor Third Party Liability policies. The model predicts the expected annual claim frequency λ per policy. log(Exposure) is used as an offset so partial-year policies are handled correctly.

Feature	Description
VehPower, VehAge	Vehicle engine power and age
DrivAge	Driver age in years
Density	Population density of driver's municipality
PriorClaims3Y	Rolling 3-year claim count — updated each simulation year
Area, VehBrand, VehGas, Region	Categorical features (label-encoded)

BonusMalus is excluded: it cannot be projected forward without a separate bonus-malus transition model. PriorClaims3Y serves as a lightweight, simulatable experience feature in its place.

Rust simulation engine

Each Rayon worker thread owns one ONNX session loaded lazily on first use. Once warm, threads run with no lock contention:

for sim in 0..n_sims (parallel across threads): for year in 0..n_years: // 1. Build feature matrix with current state build [N, 9] feature matrix // with current PriorClaims3Y // 2. Score all policies in one ONNX call λ = onnx_session.run(matrix) // annual frequency per policy // 3. Draw claims claims = Poisson(λ × exposure) // exposure = portfolio value at t=0, 1.0 thereafter // 4. Update state for next year shift rolling 3-year window // [w1, w2, w3] → [w2, w3, claims] VehAge += 1, DrivAge += 1

The ONNX model outputs λ in the original scale — onnxmltools preserves LightGBM's internal exp() transform. Expected claims per policy are therefore μ = λ × exposure, not exp(log_λ + log_exposure).

Study 1 — Inference: LightGBM vs ONNX Runtime

Both engines receive the same [N, 9] float32 feature matrix and return λ per policy. Each measurement is the minimum of 3 repetitions to suppress OS scheduling jitter; a warmup call is made first to exclude library-init overhead.

Observed results — AWS c6i.4xlarge (16 cores)

Policies	LightGBM	ONNX Runtime	Speedup
169,503 (25%)	399 ms	482 ms	0.83×

Surprise: ONNX is slower at scale. The conventional expectation is a 1.5–3× ONNX speedup from its compiled tree-ensemble execution plan. On a large batch (~170K policies), LightGBM's native inference appears better optimised — likely due to superior cache utilisation once the feature matrix exceeds L3 cache size. The gap reverses in the ~7K-policy range.

Implication: portfolio sharding

If ONNX is the inference backend (as it must be for the Rust engine), splitting a large portfolio into smaller shards keeps each ONNX call in the batch-size range where it is competitive — and delivers parallelism for free. Running 4 × 25% shards concurrently is strictly better than 1 × 100% sequentially.

Study 2 — Simulation: Rust + ONNX scaling

Observed results — AWS c6i.4xlarge (16 cores, 169 K policies)

Policies	Years	Sims	ms / sim	Wall time
169,503 (25%)	1	2,000	2,980	~99 min
169,503 (25%)	5	2,000	9,879	~5.5 hours

ONNX inference on EC2 is approximately 12× faster than on a 2019 Intel Mac (~8.6 µs/policy vs ~100 µs/policy), consistent with the AVX-512 advantage on AWS Ice Lake instances.

Simulated claim frequency over time

Output from the macOS calibration run (10 K policies, 500 sims, 5 years). Year 0 uses partial-year exposure from the portfolio; years 1–4 use full-year exposure.

=== Multi-Year Claim Simulation (500 sims × 5 years × 10,000 policies) === Year Mean claims Mean freq P50 freq P95 freq P99 freq -------------------------------------------------------------- t=0 527.7 0.05277 0.05280 0.05650 0.05801 ← partial year t=1 900.3 0.09003 0.08990 0.09500 0.09730 t=2 866.0 0.08660 0.08680 0.09121 0.09230 t=3 831.8 0.08318 0.08320 0.08800 0.08980 t=4 831.1 0.08311 0.08310 0.08750 0.08920

Mean reversion. Frequency peaks at t=1 (annualised, full exposure) and declines gradually as PriorClaims3Y mean-reverts toward the portfolio average. This is a natural consequence of the rolling window: high-claim policies accumulate history, raising their λ, but regression to the mean pulls aggregate frequency down over subsequent years.

The session-loading constraint

ONNX Runtime holds a global lock during session initialisation. Even with Rayon's work-stealing scheduler, sessions load sequentially:

T_startup ≈ n_threads × T_session (~25 s/session on macOS Intel, ~1–2 s on EC2)

This means adding more threads eventually hurts: startup cost grows faster than compute shrinks. The optimal thread count balances the two:

k* ≈ √( n_sims × n_years × T_inference(N) / T_session )

On EC2, with T_inference(678K) ≈ 5.8 s and T_session ≈ 1–2 s, k* is in the range of 60–100 threads — well above the 16 cores on a c6i.4xlarge, meaning linear scaling holds across the full instance for production workloads.

How many simulations are enough?

Use case	Target statistic	Recommended sims
Pricing / expected loss	Mean frequency	500–1,000
Reserving, confidence intervals	P95	1,000–2,000
Capital / risk margin	P99	2,000–5,000
Solvency II / regulatory capital	P99.5 (SCR)	5,000–10,000

Default recommendation: 2,500 sims. This gives reliable P99 estimates (≈ 0.2% SE in probability space) at a reasonable compute cost for most pricing and reserving tasks.

Cost estimates (AWS on-demand)

Extrapolating from the 25%-portfolio result (linear in N), a full 678K-policy, 5-year, 2,000-sim run on one c6i.4xlarge takes roughly 22 hours. A sharding strategy across four instances reduces wall time to ~5.5 hours per shard.

Cost per scenario (5 coverages × full portfolio × 5 years × 2,000 sims)

1 × c6i.4xlarge · sequential (~4.6 days)

~$180

4 × c6i.4xlarge · sharded (~5.5 h/shard)

~$90

Same, Spot instances (~70% discount)

~$27

Negligible at company scale — the bottleneck is actuary time, not compute cost.

Explore the project

Resource	Description
GitHub repository	Full source: Python pipeline, Rust engine, Terraform
BENCHMARK.md	Study design, full results tables, capacity planning formulas
README.md	Setup guide, pipeline steps, Rust CLI reference
freMTPL2freq dataset	678,013 French MTPL policies on OpenML