No acceptance criteria. Prompts need explicit goals, format, and constraints.

How to reduce hallucinations?

Constrain with schemas, retrieval, refusal rules, and evals that penalize fabrication.

Do we still need evals?

Yes—automated evals catch regressions before they hit production.

Version prompts, log every call, and gate releases with regression suites.

Schema + criteria: JSON outputs with explicit fields and quality bars.

Retrieval: Provide sources; require citations; refuse if insufficient.

Tooling: Use function calls for structured actions, not free-text.

Safety: Refusal rules, PII stripping, and approvals for risky actions.

Evals: Golden + synthetic tests before every release.

Prompts are product code. Treat them with specs, tests, and monitoring—or expect surprises in production.

Do small prompts need tests?

Yes—lightweight evals catch regressions and cost spikes early.

Which models to target?

Use small/fast models for simple tasks; reserve heavy models for reasoning.

How to manage versions?

Keep prompts in Git with changelog and linked eval results.

What about multi-turn?

Constrain memory, reset state intentionally, and test flows end-to-end.