ICCV Logo

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

1University of Modena and Reggio Emilia, Italy, 2University of Milan, Italy,

We propose ScanDiff, a unified architecture that integrates diffusion models with Vision Transformers to generate diverse and realistic gaze scanpaths. Unlike existing approaches, ScanDiff explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, enabling the generation of diverse yet plausible gaze trajectories.

ScanDiff Teaser

Abstract

Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics.

While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths.

Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives.

Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research.

Method Overview

ScanDiff Method Overview

Qualitative Results

Scanpath Variability Analysis

BibTeX

@inproceedings{cartella2025modeling,
  title     = {Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction},
  author    = {Cartella, Giuseppe and Cuculo, Vittorio and D'Amelio, Alessandro and Cornia, Marcella and Boccignone, Giuseppe and Cucchiara, Rita},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year      = {2025}
}