Jobcenters maintain relationships with hundreds of training providers. Each provider offers courses with varying schedules, locations, prerequisites, and content focus. Client profiles contain career goals, prior experience, language skills, mobility constraints, and learning preferences.
The challenge: keyword matching produces too many irrelevant results. Manual search takes too long and relies on case worker familiarity with the catalog.
Decision: Combine structured filtering with semantic matching.
The system uses a two-stage process:
Deterministic rules eliminate courses based on: location/travel time from client address, schedule conflicts with existing commitments, language requirements client does not meet, prerequisite credentials client lacks.
For remaining courses (typically 20-50), an LLM analyzes fit between client goals and course content. It considers career trajectory alignment, skill gap coverage, learning format preferences, and client-stated interests.
Output is a ranked list with brief reasoning for each match. The LLM does not make final selections—it provides ranked options for case worker review.
Constraint: Course catalog data comes from scraped provider websites and third-party APIs. It’s often incomplete or stale.
The system handles this by explicitly flagging data quality issues in match results: “last updated 6 months ago” or “schedule information unavailable.” Rather than pretending the data is complete, the system surfaces uncertainty.
Tradeoff: Match quality depends on external data we don’t control. But surfacing data provenance allows case workers to verify details with providers before recommending courses to clients—slower, but more reliable than assuming scraped data is current.
Recommended course matches are referenced in the report generation system when justifying funding requests. The system links to match reasoning, allowing auditors to trace why a specific course was recommended.