Back

AVGS Voucher Parsing

Overview

AVGS (Aktivierungs- und Vermittlungsgutschein) vouchers are PDF documents issued by German employment agencies. They authorize funding for coaching and placement services but arrive in inconsistent formats—different layouts, mixed handwriting and typed text, and varying levels of completeness.

The system needs to extract: client name and ID, voucher validity period, authorized service categories, funding amount or service hour limits, and special conditions or restrictions.

Key Design Decision

Decision: Use LLMs for extraction but require structured output with explicit confidence scoring.

Early prototypes used regex and template matching. They worked for ~60% of vouchers and failed silently on the rest. LLMs handled format variation far better but had no notion of “I’m not sure.”

The final approach:

  • LLM extracts fields and assigns a confidence score (0-100) to each
  • Fields below a confidence threshold (currently 85) are flagged for manual review
  • If any critical field (client ID, validity dates) scores low, the entire voucher is routed to human processing
  • Extraction results are validated against known client records where possible

Constraint and Tradeoff

Constraint: Manual review on ~35% of vouchers vs. 100% automation.

The confidence threshold means human staff still process a significant portion of vouchers. But this tradeoff eliminated silent failures. Before confidence scoring, incorrect extractions propagated into downstream systems and only surfaced when clients reported billing errors.

With explicit uncertainty flagging, the error rate dropped to near-zero and staff trust in the automated extractions increased because flagged cases actually needed review.

What This Connects To

Parsed voucher data feeds into the report generation system. If voucher details are incomplete or uncertain, reports flag this explicitly rather than proceeding with assumptions.

Back to overview