VISION TRANSFORMER#
Classes#
- class models.dualprompt_utils.vision_transformer.VisionTransformer(prompt_length=None, embedding_key='cls', prompt_init='uniform', prompt_pool=False, prompt_key=False, pool_size=None, top_k=None, batchwise_prompt=False, prompt_key_init='uniform', head_type='token', use_prompt_mask=False, use_g_prompt=False, g_prompt_length=None, g_prompt_layer_idx=None, use_prefix_tune_for_g_prompt=False, use_e_prompt=False, e_prompt_layer_idx=None, use_prefix_tune_for_e_prompt=False, same_key_value=False, args=None, **kwargs)[source]#
Bases:
VisionTransformer
Functions#
- models.dualprompt_utils.vision_transformer.checkpoint_filter_fn(state_dict, model, adapt_layer_scale=False)[source]#
convert patch embedding weight from manual patchify + linear proj to conv
- models.dualprompt_utils.vision_transformer.resize_pos_embed(posemb, posemb_new, num_prefix_tokens=1, gs_new=())[source]#
- models.dualprompt_utils.vision_transformer.vit_base_patch16_224_dualprompt(pretrained=False, **kwargs)[source]#
ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929). ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer.