GenEval Benchmark Table

Budget	Generator	Steps	Verifier	Best-of-N	Single	Two	Counting	Color	Position	Attribution	Overall
200ms	SANA-Sprint	1	–	Best-of-1	99.3	88.1	56.0	87.6	54.1	47.8	71.6
550ms	SANA-1.5	4	–	Best-of-1	98.8	78.2	66.5	71.1	50.6	20.8	63.0
	SANA-Sprint	8	–	Best-of-1	99.5	91.9	59.3	86.0	57.8	52.4	74.0
	SANA-Sprint	1	MLLM w/ CLIP	Best-of-2	100.0	91.3	59.5	88.0	61.0	55.4	75.4
	SANA-Sprint	1	MLLM w/ AE	Best-of-3	100.0	90.9	59.0	89.6	55.8	50.6	73.1
	SANA-Sprint	1	VHS (Ours)	Best-of-4	100.0	93.9	61.5	90.6	66.2	58.4	78.1
1100ms	SANA-1.5	12	–	Best-of-1	100.0	92.7	74.8	88.3	61.4	59.6	78.8
	SANA-Sprint	20	–	Best-of-1	100.0	88.5	59.8	89.6	48.6	51.0	72.2
	SANA-Sprint	1	MLLM w/ CLIP	Best-of-4	100.0	92.7	66.0	88.9	65.9	61.6	78.8
	SANA-Sprint	1	MLLM w/ AE	Best-of-7	99.7	90.7	61.3	90.8	59.6	49.3	74.7
	SANA-Sprint	1	VHS (Ours)	Best-of-9	100.0	95.7	66.5	88.9	69.8	63.8	80.5
1650ms	SANA-1.5	16	–	Best-of-1	99.7	93.5	77.3	89.1	60.2	60.8	79.4
	SANA-Sprint	30	–	Best-of-1	100.0	90.5	57.3	85.1	49.3	50.2	71.4
	SANA-Sprint	1	MLLM w/ CLIP	Best-of-6	100.0	93.9	68.2	88.7	69.8	64.2	80.4
	SANA-Sprint	1	MLLM w/ AE	Best-of-11	99.7	90.5	59.3	89.8	58.4	49.0	73.9
	SANA-Sprint	1	VHS (Ours)	Best-of-15	100.0	96.0	67.3	89.1	70.4	64.6	80.9

Table 1. Accuracy (%) on the GenEval benchmark across computational budgets, generator backbones, and verifier configurations (on LLM Qwen2.5-0.5B). Results compare SANA-1.5 and SANA-Sprint under matched wall-clock budgets (milliseconds), with each verifier operating under the same time constraint via adaptive Best-of-N.