I design AI systems for environments where wrong answers have consequences.
Decision-critical systems. Regulated environments. Production-ready AI orchestration with human-in-the-loop design.

// Design Philosophy
How I Design AI Systems
> Deterministic logic before probabilistic models
Rules handle what can be encoded. LLMs handle what cannot. The tradeoff: slower iteration, higher reliability.
> Human review as a system component, not a fallback
Workflows designed for human oversight from the start. Review becomes data that improves retrieval and future outputs.
> AI explains decisions instead of making them when stakes are high
Generate reasoning, evidence, and recommendations. Let humans make the final call. Reduces both errors and liability.
> Auditability and traceability over model cleverness
Every output must be reproducible and explainable. Sophisticated prompts mean nothing if you cannot debug them in production.
> Constraints create better systems than capabilities
Limiting what the AI can do often produces more reliable outcomes than expanding what it might do.
// Primary System
Flagship Production System
Jobcenter Report Generation & Review System
AI-assisted report generation for German public sector funding decisions. Combines deterministic logic, RAG, and structured human review to produce auditable recommendations in regulated environments.
// Leadership
How I Led and Shipped
> Leadership & Ownership
Led a team of 4 engineers building GenAI features across core workflows. Owned roadmap and technical direction, ran weekly release reviews with Quality Management and Operations, and set explicit guardrails for what the model can and cannot do in production.
Worked directly with C-level leadership to prioritize AI features, communicate constraints, and keep stakeholders aligned on risk, rollout, and reliability requirements.
// Tech Stack
Production Stack
> What I used to run these systems in production
// Related Subsystems
Supporting Systems
These are not “projects.” They’re subsystems that support the flagship workflow and share the same constraints: ground outputs, fail closed, and keep decisions accountable.
AVGS Voucher Parsing
Structured extraction from unstructured PDF vouchers using LLMs with validation layers.
Client-to-Course Matching
Semantic matching between client profiles and course catalogs using scraped APIs and LLM reasoning.
Agentic Onboarding Flow
Stateful conversation flow with routing logic and explicit stopping conditions for client intake.
// Product Experiment
AI Sprint Planning & Pushback Engine
Separate prototype built to explore how AI can turn vague features into task graphs, estimates, and explicit tradeoffs, without pretending uncertainty doesn’t exist.
AI Sprint Planning & Pushback Engine
Converts features into task graphs with estimates and multi-owner assignment, and turns AI pushback into Decisions, Questions, and Guardrails that can block work until resolved.
// Restraint
Where I Did Not Use AI
This is where systems fail in the real world. I treat “what not to automate” as a design decision, not a limitation. Knowing when to stop is harder, and usually more valuable, than knowing what’s technically possible.
Final funding decisions
The AI generates recommendations and supporting evidence, but never makes the approval or rejection decision. That authority stays with human case workers who understand context the system cannot encode.
Regulatory interpretation
When guidelines changed, I rejected an LLM-based interpretation layer. Instead, explicit business rules were updated manually. This traded development speed for legal defensibility.
Production failure: hallucinated course details
Early versions generated plausible but incorrect course information when retrieval returned no matches. Fixed by adding explicit “no match found” logic and constraining the LLM to only reference retrieved data.
Production failure: review fatigue
Initial reports were “approve or reject,” and reviewers stopped reading carefully. Redesigned to generate section drafts requiring active editing. Engagement went up, and error detection improved.
// Background
Context & Environment
Applied AI Engineer and Technical Product Owner with experience designing and shipping GenAI systems in production.
I specialize in environments where system failures have real consequences: public sector decision support, funding allocation workflows, and regulated domains where auditability is not optional. This means collaborating with operations and compliance-minded stakeholders who need to understand what the system does, not just what it outputs.
My work sits at the intersection of LLM capabilities and deterministic logic: knowing when to use each, and how to make them behave predictably under real constraints.