Invoice Compliance Arena

High-fidelity AI agent evaluation for complex back-office workflows.

54
Scenarios
3
Difficulties
4
Policy Gates
0.99
Score Cap
Core Task Curriculum
exact_match_audit
20 scenarios EASY
fuzzy_reconciliation
15 scenarios MEDIUM
discrepancy_fraud
19 scenarios HARD
Global Reward Function
0.6·correct + 0.2·compliance + 0.2·heuristic - 0.1·trap
Sample Audit Log
STEP 1 select_po po_id=PO-5001 +0.20
CHECK policy_engine rule=SOC2_CHECK triggered=TRUE
STEP 2 final_decision decision=REJECT +0.79
Cumulative Reward: 0.99
API Endpoints
POST /reset

Start episode

POST /step

Submit action

GET /tasks

List curriculum