Preprint / 2026

Few Channels Draw The Whole Picture

Revealing Massive Activations in Diffusion Transformers

A sparse subspace that organizes, localizes, and transports semantic information inside modern DiT models: critical, spatially organized, and transferable semantic carrier channels.

Evelyn Turri*1 Davide Bucciarelli*1,2 Sara Sarto1 Lorenzo Baraldi1 Marcella Cornia1
1 University of Modena and Reggio Emilia, Italy · 2 University of Pisa, Italy · * Equal contribution
top-kchannels
K=2spatial mask
0-shottransport
Abstract

A sparse subspace with outsized semantic control.

Diffusion Transformers and related flow-based architectures are now among the strongest text-to-image generators, yet the internal mechanisms through which prompts shape image semantics remain poorly understood. We study massive activations: a small subset of hidden-state channels whose responses are consistently much larger than the rest.

Despite their sparsity, these few channels effectively draw the whole picture. They are functionally critical, spatially organized, and transferable across prompt-conditioned trajectories.

Together, the results recast massive activations not as activation anomalies, but as a sparse prompt-conditioned carrier subspace that organizes and controls semantic information in modern DiT models.

Findings

Three signals point to one mechanism.

The analysis combines disruption, clustering, and activation transport to expose where semantic information concentrates inside DiT hidden states.

01

Functional criticality

Zeroing the top-k massive channels sharply degrades generation quality, while disrupting low-statistic channels has much smaller effect.

02

Spatial organization

Clustering image-stream tokens restricted to massive channels yields foreground/background masks aligned with salient image regions.

03

Semantic transferability

Transporting massive activations from one trajectory into another shifts the output toward source semantics while preserving target structure.

Method

Identify, mask, transport, generate.

At each layer, massive activations are selected as top-k channels by absolute channel mean across image tokens, then combined with a spatial mask for targeted activation replacement.

1

Identify channels

Compute absolute channel means and select the top-k massive channels per layer.

2

Estimate spatial mask

Restrict image tokens to the selected channels and cluster them into foreground/background regions.

3

Transport activations

Inject source activations into the target stream under the joint channel and spatial mask.

4

Decode output

Generate the final image with source semantics blended into the target composition.

FLUX.1-devFLUX.1-schnellFLUX.2-kleinQwen-ImageSANA 1.5
Citation

Cite this work.

Use the BibTeX entry below to cite the arXiv preprint.

@article{turri2026channelsdrawpicturerevealing,
      title={Few Channels Draw The Whole Picture: Revealing Massive Activations in Diffusion Transformers},
      author={Evelyn Turri and Davide Bucciarelli and Sara Sarto and Lorenzo Baraldi and Marcella Cornia},
      year={2026},
      journal={arXiv preprint arXiv:2605.13974},
}