Applied AI Engineer

I design AI systems for environments where wrong answers have consequences.

Decision-critical workflows. Regulated constraints. Production AI with human oversight designed in, not patched on.

Get in touch View case studies

// Design Philosophy

How I Design AI Systems

> Deterministic logic before probabilistic models

Rules handle what can be encoded. LLMs handle what cannot. The tradeoff: slower iteration, higher reliability.

> Human review as a system component, not a fallback

Workflows are designed for oversight from day one. Review output becomes structured data that improves retrieval and future drafts.

> Evidence-first outputs when stakes are high

The system produces recommendations plus evidence. Humans make the final call. Less liability, fewer silent failures.

> Auditability and traceability over “clever prompts”

Every output must be reproducible and debuggable. If you can’t trace it, you can’t ship it.

> Fail closed, not confidently wrong

“No match found” is a feature. Guardrails beat improvisation when the cost of being wrong is real.

Next: Flagship Case Study → Architecture + constraints

// Flagship Case Study

Public-Sector Decision Support

// NDA note

Some implementation details are intentionally generalized due to NDA and public-sector confidentiality. What’s shown here is the architecture, guardrails, and evaluation approach, without customer data or proprietary workflows.

Evidence-Grounded Report Drafting + Human Review

A production workflow that drafts decision-support reports under strict constraints: evidence-first retrieval, deterministic rules, and structured human review. Designed to be auditable, explainable, and safe under regulatory scrutiny.

LLM orchestrationRetrieval + citationsHuman-in-the-loopRegulated domain

Next: Leadership → How it shipped under real constraints

// Leadership

How I Led and Shipped

> Ownership snapshot

Led a team of 4 engineers delivering GenAI features across core workflows in a startup environment. Owned roadmap and technical direction, ran weekly release reviews with Quality Management and Operations, and set guardrails for what the system can and cannot do in production.

Worked directly with C-level leadership to prioritize AI work, communicate constraints, and align stakeholders on risk, rollout, and reliability requirements.

Tradeoff: Speed vs. Auditability

Chose rule-based approvals over free-form generation to preserve traceability. Slower iteration, defensible outputs.

Decision: Evidence-first retrieval

Delayed launch to enforce retrieval + citation rules so outputs had to be grounded in sourced data. The goal: stop “plausible text” and force “verifiable evidence.”

Rollout: Staged human review

Shipped in phases: side-by-side comparison, then AI-assisted drafts, then AI-first with human approval. Made failures visible early, before scale.

Next: Tech Stack → What I used in production

// Tech Stack

Production Stack

> What I used to run these systems in production

Orchestration

LangChainOpenAI Agents SDKOpenAI API

Retrieval

ChromaDBPostgreSQL + pgvector

Backend

PythonFlaskCelery

Infrastructure

AWSDockerRedis

Evaluation

Eval harnessRegression checksFailure taxonomy

Observability

Structured logsFailure triageReview analytics

Next: Supporting Systems

// Supporting Systems

Subsystems (Case Studies)

Reliability-first

Document Understanding Pipeline

Extracts structured fields from messy PDFs using validation layers and “fail closed” behavior.

Most technical

Semantic Matching System

Matches user profiles to program catalogs using retrieval, constraints, and controlled reasoning.

Product + systems

Agentic Onboarding Flow

Stateful intake with routing logic, stopping conditions, and explicit uncertainty handling.

Next: Restraint → Where AI was blocked (on purpose)

// Restraint

Where I Did Not Use AI

Knowing what not to automate is a design decision. In regulated environments, “don’t guess” beats “sounds smart.”

Pattern: Keep sovereignty human

Final decisions

The system drafts recommendations and evidence, but never makes the final call. Humans retain authority and context.

Pattern: Rules over model interpretation

Policy interpretation

When guidelines changed, I rejected “LLM decides what the rule means.” We updated explicit rules manually to keep outputs defensible.

Pattern: Fail closed

Failure mode: plausible but wrong details

Early versions produced plausible text when retrieval returned no matches. Fixed with explicit “no match found” logic and strict constraints: only reference retrieved evidence.

Pattern: Design for active oversight

Failure mode: review fatigue

“Approve or reject” reports trained reviewers to skim. Redesigned to draft sections that require active editing, which improved attention and error detection.

Next: Side Projects → Building in public

// Side Projects

Explorations & Systems

POCO Steering OS

A Product Intelligence OS that turns chaotic feature requests into traceable, capacity-aware decisions. Deterministic prioritization + AI-assisted capacity planning + explicit tradeoffs for teams drowning in scope creep.

Product OSPrioritizationRoadmapNext.js

Next: Background → Who I am + context

// Background

Context & Environment

// Operating context

Startup pace. Regulated workflows. Stakeholders with low risk tolerance.

Applied AI Engineer and Technical Product Owner with experience designing and shipping GenAI systems in production. Worked at Taleroo (Germany). Details are generalized where needed due to NDA.

I build systems where correctness, traceability, and operational adoption matter as much as model quality. That usually means collaborating closely with operations and compliance-minded stakeholders, not just engineers.

My work sits at the intersection of LLM capabilities and deterministic logic: knowing when to use each, and how to make them behave predictably under real constraints.

Next: Contact

// Get in Touch

If you’re building AI systems under real constraints, I’m open to a conversation.

Email GitHub LinkedIn