Few Channels Draw The Whole Picture

Abstract

A sparse subspace with outsized semantic control.

Diffusion Transformers and related flow-based architectures are now among the strongest text-to-image generators, yet the internal mechanisms through which prompts shape image semantics remain poorly understood. We study massive activations: a small subset of hidden-state channels whose responses are consistently much larger than the rest.

Despite their sparsity, these few channels effectively draw the whole picture. They are functionally critical, spatially organized, and transferable across prompt-conditioned trajectories.

Together, the results recast massive activations not as activation anomalies, but as a sparse prompt-conditioned carrier subspace that organizes and controls semantic information in modern DiT models.

Findings

Three signals point to one mechanism.

The analysis combines disruption, clustering, and activation transport to expose where semantic information concentrates inside DiT hidden states.

Functional criticality

Zeroing the top-k massive channels sharply degrades generation quality, while disrupting low-statistic channels has much smaller effect.

Spatial organization

Clustering image-stream tokens restricted to massive channels yields foreground/background masks aligned with salient image regions.

Semantic transferability

Transporting massive activations from one trajectory into another shifts the output toward source semantics while preserving target structure.

Approach

Identify, mask, transport, generate.

At each layer, massive activations are selected as top-k channels by absolute channel mean across image tokens, then combined with a spatial mask for targeted activation replacement.

Identify channels

Compute absolute channel means and select the top-k massive channels per layer.

Estimate spatial mask

Restrict image tokens to the selected channels and cluster them into foreground/background regions.

Transport activations

Inject source activations into the target stream under the joint channel and spatial mask.

Decode output

Generate the final image with source semantics blended into the target composition.

FLUX.1-devFLUX.1-schnellFLUX.2-kleinQwen-ImageSANA 1.5

Takeaways

What massive activations make visible.

A compact set of channels carries a surprising amount of semantic control through the generation process.

A few channels control generation.

Massive Activations (MAs) are sparse but critical hidden channels that strongly influence image quality and prompt alignment in Diffusion Transformers.

MAs reveal structured semantics.

Top-k activation channels naturally align with meaningful spatial regions like foreground objects and salient scene elements.

MAs enable controllable semantic transport.

Transplanting massive activations across generations produces localized semantic transfer and coherent prompt composition without additional training.

Citation

Cite this work.

Use the BibTeX entry below to cite the arXiv preprint.

@article{turri2026channelsdrawpicturerevealing,
      title={Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers},
      author={Evelyn Turri and Davide Bucciarelli and Sara Sarto and Lorenzo Baraldi and Marcella Cornia},
      year={2026},
      journal={arXiv preprint arXiv:2605.13974},
}