Evaluation

Core Questions

?What criteria determine whether the output is acceptable?
?What scoring scale and thresholds apply?
?What happens if the output falls below the threshold?
?How many revision cycles are permitted?

Micro-Template

 Evaluate against: (1) [Criterion] — Score 1-5. (2) [Criterion] — Score 1-5. If any score < [threshold], revise [component] and regenerate. Max [N] cycles. 

Anti-Patterns

Error	Correction
No evaluation criteria defined	Always include at least 2-3 measurable criteria for professional outputs.
Vague quality language ('make it good')	Replace subjective terms with specific, scorable criteria tied to the Object.
No threshold for acceptable quality	Set a minimum score threshold that triggers revision if not met.
No iteration mechanism	Define what happens when output fails evaluation: which component to revise, how many cycles.
Evaluating against criteria not reflected in the prompt	Ensure every evaluation criterion maps to a component specified earlier in the prompt.

Self-Check

✔Does each evaluation criterion trace back to a specific component?
✔Is the scoring scale clear and actionable?
✔Would a human reviewer agree these are the right criteria for this task?

Interaction Note

Evaluation closes the loop. It transforms prompt engineering from a one-shot activity into a systematic, iterative process. E criteria should map directly to M (goal alignment), O (deliverable quality), and T (methodological rigor).

← V — Variables Tiers →

Tiers

→