Invoice Compliance Arena
High-fidelity AI agent evaluation for complex back-office workflows.
54
Scenarios
3
Difficulties
4
Policy Gates
0.99
Score Cap
Core Task Curriculum
exact_match_audit
fuzzy_reconciliation
discrepancy_fraud
Global Reward Function
0.6·correct + 0.2·compliance + 0.2·heuristic - 0.1·trap
Sample Audit Log
STEP 1 select_po po_id=PO-5001 +0.20
CHECK policy_engine rule=SOC2_CHECK triggered=TRUE
STEP 2 final_decision decision=REJECT +0.79
Cumulative Reward: 0.99
API Endpoints
POST /reset
Start episode
POST /step
Submit action
GET /tasks
List curriculum