Functional criticality
Zeroing the top-k massive channels sharply degrades generation quality, while disrupting low-statistic channels has much smaller effect.
Preprint / 2026
Revealing Massive Activations in Diffusion Transformers
A sparse subspace that organizes, localizes, and transports semantic information inside modern DiT models: critical, spatially organized, and transferable semantic carrier channels.
Diffusion Transformers and related flow-based architectures are now among the strongest text-to-image generators, yet the internal mechanisms through which prompts shape image semantics remain poorly understood. We study massive activations: a small subset of hidden-state channels whose responses are consistently much larger than the rest.
Despite their sparsity, these few channels effectively draw the whole picture. They are functionally critical, spatially organized, and transferable across prompt-conditioned trajectories.
Together, the results recast massive activations not as activation anomalies, but as a sparse prompt-conditioned carrier subspace that organizes and controls semantic information in modern DiT models.
The analysis combines disruption, clustering, and activation transport to expose where semantic information concentrates inside DiT hidden states.
Zeroing the top-k massive channels sharply degrades generation quality, while disrupting low-statistic channels has much smaller effect.
Clustering image-stream tokens restricted to massive channels yields foreground/background masks aligned with salient image regions.
Transporting massive activations from one trajectory into another shifts the output toward source semantics while preserving target structure.
At each layer, massive activations are selected as top-k channels by absolute channel mean across image tokens, then combined with a spatial mask for targeted activation replacement.
Compute absolute channel means and select the top-k massive channels per layer.
Restrict image tokens to the selected channels and cluster them into foreground/background regions.
Inject source activations into the target stream under the joint channel and spatial mask.
Generate the final image with source semantics blended into the target composition.
Use the BibTeX entry below to cite the arXiv preprint.
@article{turri2026channelsdrawpicturerevealing,
title={Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers},
author={Evelyn Turri and Davide Bucciarelli and Sara Sarto and Lorenzo Baraldi and Marcella Cornia},
year={2026},
journal={arXiv preprint arXiv:2605.13974},
}