Budget Generator Steps Verifier Best-of-N Single Two Counting Color Position Attribution Overall
200ms SANA-Sprint 1 Best-of-1 99.3 88.1 56.0 87.6 54.1 47.8 71.6
550ms SANA-1.5 4 Best-of-1 98.8 78.2 66.5 71.1 50.6 20.8 63.0
SANA-Sprint 8 Best-of-1 99.5 91.9 59.3 86.0 57.8 52.4 74.0
SANA-Sprint 1 MLLM w/ CLIP Best-of-2 100.0 91.3 59.5 88.0 61.0 55.4 75.4
SANA-Sprint 1 MLLM w/ AE Best-of-3 100.0 90.9 59.0 89.6 55.8 50.6 73.1
SANA-Sprint 1 VHS (Ours) Best-of-4 100.0 93.9 61.5 90.6 66.2 58.4 78.1
1100ms SANA-1.5 12 Best-of-1 100.0 92.7 74.8 88.3 61.4 59.6 78.8
SANA-Sprint 20 Best-of-1 100.0 88.5 59.8 89.6 48.6 51.0 72.2
SANA-Sprint 1 MLLM w/ CLIP Best-of-4 100.0 92.7 66.0 88.9 65.9 61.6 78.8
SANA-Sprint 1 MLLM w/ AE Best-of-7 99.7 90.7 61.3 90.8 59.6 49.3 74.7
SANA-Sprint 1 VHS (Ours) Best-of-9 100.0 95.7 66.5 88.9 69.8 63.8 80.5
1650ms SANA-1.5 16 Best-of-1 99.7 93.5 77.3 89.1 60.2 60.8 79.4
SANA-Sprint 30 Best-of-1 100.0 90.5 57.3 85.1 49.3 50.2 71.4
SANA-Sprint 1 MLLM w/ CLIP Best-of-6 100.0 93.9 68.2 88.7 69.8 64.2 80.4
SANA-Sprint 1 MLLM w/ AE Best-of-11 99.7 90.5 59.3 89.8 58.4 49.0 73.9
SANA-Sprint 1 VHS (Ours) Best-of-15 100.0 96.0 67.3 89.1 70.4 64.6 80.9
Table 1. Accuracy (%) on the GenEval benchmark across computational budgets, generator backbones, and verifier configurations (on LLM Qwen2.5-0.5B). Results compare SANA-1.5 and SANA-Sprint under matched wall-clock budgets (milliseconds), with each verifier operating under the same time constraint via adaptive Best-of-N.