Applied AI Engineer

I design AI systems for environments where wrong answers have consequences.

Decision-critical systems. Regulated environments. Production-ready AI orchestration with human-in-the-loop design.

Profile

// Design Philosophy

How I Design AI Systems

> Deterministic logic before probabilistic models

Rules handle what can be encoded. LLMs handle what cannot. The tradeoff: slower iteration, higher reliability.

> Human review as a system component, not a fallback

Workflows designed for human oversight from the start. Review becomes data that improves retrieval and future outputs.

> AI explains decisions instead of making them when stakes are high

Generate reasoning, evidence, and recommendations. Let humans make the final call. Reduces both errors and liability.

> Auditability and traceability over model cleverness

Every output must be reproducible and explainable. Sophisticated prompts mean nothing if you cannot debug them in production.

> Constraints create better systems than capabilities

Limiting what the AI can do often produces more reliable outcomes than expanding what it might do.

Next: Flagship System → See the architecture + constraints

// Primary System

Flagship Production System

Jobcenter Report Generation & Review System

AI-assisted report generation for German public sector funding decisions. Combines deterministic logic, RAG, and structured human review to produce auditable recommendations in regulated environments.

LLM OrchestrationRAGHuman-in-the-loopPublic Sector
Next: Leadership → How it shipped in a startup

// Leadership

How I Led and Shipped

> Leadership & Ownership

Led a team of 4 engineers building GenAI features across core workflows. Owned roadmap and technical direction, ran weekly release reviews with Quality Management and Operations, and set explicit guardrails for what the model can and cannot do in production.

Worked directly with C-level leadership to prioritize AI features, communicate constraints, and keep stakeholders aligned on risk, rollout, and reliability requirements.

Tradeoff: Speed vs. Compliance
Chose rule-based approval workflows over free-form generation to preserve audit trails. Slower iteration, defensible outputs.
Decision: Evidence-first RAG
Delayed launch to add retrieval + citation rules so outputs had to be grounded in sourced data. The goal: stop “plausible text” and force “verifiable evidence.”
Rollout: Staged Human Review
Shipped in phases: side-by-side comparison (AI vs. manual), then AI-assisted drafts, then AI-first with human approval. This reduced distrust and made failure cases visible early.
Next: Tech Stack → What I used to run this in production

// Tech Stack

Production Stack

> What I used to run these systems in production

Orchestration
LangChainOpenAI Agents SDKOpenAI API
Retrieval
ChromaDBPostgreSQL + pgvector
Backend
PythonFlaskCelery
Infrastructure
AWSDockerRedis
Evaluation
OpenAI PlatformEval harnessRegression checks
Observability
Structured logsFailure triageReview analytics
Next: Supporting Systems

// Related Subsystems

Supporting Systems

These are not “projects.” They’re subsystems that support the flagship workflow and share the same constraints: ground outputs, fail closed, and keep decisions accountable.

// Product Experiment

AI Sprint Planning & Pushback Engine

Separate prototype built to explore how AI can turn vague features into task graphs, estimates, and explicit tradeoffs, without pretending uncertainty doesn’t exist.

04

AI Sprint Planning & Pushback Engine

Converts features into task graphs with estimates and multi-owner assignment, and turns AI pushback into Decisions, Questions, and Guardrails that can block work until resolved.

Next: Restraint → Where AI was blocked (on purpose)

// Restraint

Where I Did Not Use AI

This is where systems fail in the real world. I treat “what not to automate” as a design decision, not a limitation. Knowing when to stop is harder, and usually more valuable, than knowing what’s technically possible.

Pattern: Keep sovereignty human

Final funding decisions

The AI generates recommendations and supporting evidence, but never makes the approval or rejection decision. That authority stays with human case workers who understand context the system cannot encode.

Pattern: Rules > model interpretation

Regulatory interpretation

When guidelines changed, I rejected an LLM-based interpretation layer. Instead, explicit business rules were updated manually. This traded development speed for legal defensibility.

Pattern: Fail closed

Production failure: hallucinated course details

Early versions generated plausible but incorrect course information when retrieval returned no matches. Fixed by adding explicit “no match found” logic and constraining the LLM to only reference retrieved data.

Pattern: Design for active oversight

Production failure: review fatigue

Initial reports were “approve or reject,” and reviewers stopped reading carefully. Redesigned to generate section drafts requiring active editing. Engagement went up, and error detection improved.

Next: Context

// Background

Context & Environment

// Operating context
Startup pace. Public-sector constraints. Stakeholders with low risk tolerance.

Applied AI Engineer and Technical Product Owner with experience designing and shipping GenAI systems in production.

I specialize in environments where system failures have real consequences: public sector decision support, funding allocation workflows, and regulated domains where auditability is not optional. This means collaborating with operations and compliance-minded stakeholders who need to understand what the system does, not just what it outputs.

My work sits at the intersection of LLM capabilities and deterministic logic: knowing when to use each, and how to make them behave predictably under real constraints.

Next: Contact

// Get in Touch

If you’re building AI systems under real constraints, I’m open to a conversation.