English
Irena Gao commited on
Commit
218fabb
·
1 Parent(s): 2f28ef4

update README

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -16,9 +16,30 @@ We follow the Flamingo modeling paradigm, outfitting the layers of a pretrained,
16
 
17
  This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
18
 
 
 
19
  ## Uses
20
  OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
 
 
 
22
  ### Generation example
23
  Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.
24
 
 
16
 
17
  This model has cross-attention modules inserted in *every fourth* decoder block. It was trained using DistributedDataParallel across 64 A100 80GB GPUs at automatic BF16 mixed precision.
18
 
19
+ To use these MPT weights, OpenFlamingo must be initialized using revision `68e1a8e0ebb9b30f3c45c1ef6195980f29063ae2` of the MPT-7B modeling code. We suggest using [this copy of the model](https://huggingface.co/anas-awadalla/mpt-7b) to ensure the code is loaded at that commit.
20
+
21
  ## Uses
22
  OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
23
+ ### Initialization
24
+
25
+ ``` python
26
+ from open_flamingo import create_model_and_transforms
27
+
28
+ model, image_processor, tokenizer = create_model_and_transforms(
29
+ clip_vision_encoder_path="ViT-L-14",
30
+ clip_vision_encoder_pretrained="openai",
31
+ lang_encoder_path="anas-awadalla/mpt-7b",
32
+ tokenizer_path="anas-awadalla/mpt-7b",
33
+ cross_attn_every_n_layers=4
34
+ )
35
+
36
+ # grab model checkpoint from huggingface hub
37
+ from huggingface_hub import hf_hub_download
38
+ import torch
39
 
40
+ checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-9B-vitl-mpt7b", "checkpoint.pt")
41
+ model.load_state_dict(torch.load(checkpoint_path), strict=False)
42
+ ```
43
  ### Generation example
44
  Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.
45