HySAC: Hyperbolic Safety-Aware Vision-Language Models

1University of Modena and Reggio Emilia, 2University of Pisa,
3University of Amsterdam, 4IIT-CNR, Italy
* Equal contribution

Hyperbolic Safety-Aware CLIP (HySAC) is a fine-tuned CLIP model that leverages the hierarchical properties of hyperbolic space to enhance safety in vision-language tasks. HySAC models the relationship between safe and unsafe image-text pairs, enabling effective retrieval of unsafe content and the ability to dynamically redirect unsafe queries to safer alternatives.

Warning: This project involves explicit sexual content, racially insensitive language, and other material that may be harmful or disturbing to certain users. Please use this content solely for research purposes and proceed with caution.

Abstract

Addressing the retrieval of unsafe content from vision-language models such as CLIP is an important step towards real-world integration. Current efforts have relied on unlearning techniques that try to erase the model's knowledge of unsafe concepts. While effective in reducing unwanted outputs, unlearning limits the model's capacity to discern between safe and unsafe content. In this work, we introduce a novel approach that shifts from unlearning to an awareness paradigm by leveraging the inherent hierarchical properties of the hyperbolic space.

We propose to encode safe and unsafe content as an entailment hierarchy, where both are placed in different regions of hyperbolic space. Our HySAC, Hyperbolic Safety-Aware CLIP, employs entailment loss functions to model the hierarchical and asymmetrical relations between safe and unsafe image-text pairs. This modelling, ineffective in standard vision-language models due to their reliance on Euclidean embeddings, endows the model with awareness of unsafe content, enabling it to serve as both a multimodal unsafe classifier and a flexible content retriever, with the option to dynamically redirect unsafe queries toward safer alternatives or retain the original output.

Extensive experiments show that our approach not only enhances safety recognition but also establishes a more adaptable and interpretable framework for content moderation in vision-language models.

Model Architecture

HySAC learns to encode text and images in a shared hyperbolic space using entailment cones and contrastive losses. Unsafe content is pushed further from the origin, forming a safety-aware hierarchy. A traversal mechanism allows redirecting unsafe queries to safe regions without discarding original semantics.

We validate the model using a synthetic and real-world benchmark across retrieval and generation tasks, demonstrating HySAC's ability to preserve performance while enhancing safety and interpretability.

Embedding Distance Distribution

The figure illustrates the distance distribution of embeddings from the origin in the hyperbolic space. Unlike CLIP and Safe-CLIP, which do not separate safe from unsafe data, HySAC yields four distinct clusters corresponding to safe/unsafe and text/image types. This confirms the emergence of a well-structured safety hierarchy.

Qualitative Results

Image-to-Text safety traversal results. HySAC traverses towards the root feature, retrieving the top-1 text at each interpolation point.
Image-to-Image traversals from unsafe image queries towards safe images.

The figures shows an example of HySAC's ability to dynamically traverse the embedding space. Starting from unsafe queries, the model progressively shifts the retrieved outputs toward safer and semantically consistent results. This illustrates HySAC's capacity to balance safety and relevance through geometry-aware retrieval.

Citation

@inproceedings{poppi2025hyperbolic,
  title={Hyperbolic Safety-Aware Vision-Language Models},
  author={Poppi, Tobia and Kasarla, Tejaswi and Mettes, Pascal and Baraldi, Lorenzo and Cucchiara, Rita},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}