MODEL#
Classes#
- class models.zscl_utils.clip.model.AttentionPool2d(spacial_dim, embed_dim, num_heads, output_dim=None)[source]#
Bases:
Module
- class models.zscl_utils.clip.model.Bottleneck(inplanes, planes, stride=1)[source]#
Bases:
Module
- expansion = 4#
- class models.zscl_utils.clip.model.CLIP(embed_dim, image_resolution, vision_layers, vision_width, vision_patch_size, context_length, vocab_size, transformer_width, transformer_heads, transformer_layers, baseline=False)[source]#
Bases:
Module
- property dtype#
- class models.zscl_utils.clip.model.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, bias=True, device=None, dtype=None)[source]#
Bases:
LayerNorm
Subclass torch’s LayerNorm to handle fp16.
- class models.zscl_utils.clip.model.ModifiedResNet(layers, output_dim, heads, input_resolution=224, width=64)[source]#
Bases:
Module
A ResNet class that is similar to torchvision’s but contains the following changes: - There are now 3 “stem” convolutions as opposed to 1, with an average pool instead of a max pool. - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1 - The final pooling layer is a QKV attention instead of an average pool
- class models.zscl_utils.clip.model.ResidualAttentionBlock(d_model, n_head, attn_mask=None)[source]#
Bases:
Module