LLM Engineering
Building an AI Log Diagnosis Assistant: Architecture and Lessons
A reference architecture for AI-assisted workflow failure diagnosis using logs, metadata, retrieval, evaluation, and feedback loops.
The architecture
An AI log diagnosis assistant should be built around evidence. The assistant needs task logs, workflow metadata, platform context, historical incidents, and known remediation examples.

Core components
The minimum useful architecture includes log ingestion, preprocessing, error signature extraction, a classification taxonomy, retrieval over historical cases, LLM explanation generation, and a feedback loop.
Evaluation matters
Evaluation should use real historical failures. A good test set includes obvious repeated errors, ambiguous failures, noisy logs, missing context, and cases where the assistant should say it is uncertain.
Lessons
Teams often underestimate normalization. Before the LLM can help, logs and metadata need consistent workflow IDs, task names, runtime environments, owners, and error boundaries.
The highest-value output is not a long answer. It is a short diagnosis with evidence, risk level, likely component, and next action.