ManishThota commited on
Commit
1e8fcbf
·
verified ·
1 Parent(s): c0c527d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -1
README.md CHANGED
@@ -1,3 +1,60 @@
1
  ---
2
- license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: creativeml-openrail-m
3
  ---
4
+ ---
5
+ <h1 align='center' style='font-size: 36px; font-weight: bold;'>Sparrow</h1>
6
+ <h3 align='center' style='font-size: 24px;'>Tiny Vision Language Model</h3>
7
+
8
+
9
+ <p align="center">
10
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/650c7fbb8ffe1f53bdbe1aec/DTjDSq2yG-5Cqnk6giPFq.jpeg" width="50%" height="auto"/>
11
+ </p>
12
+
13
+ <p align='center' style='font-size: 16px;'>
14
+ 3B parameter model built by <a href="https://www.linkedin.com/in/manishkumarthota/">@Manish</a> using SigLIP, Phi-2, Language Modeling Loss, LLaVa data, and Custom setting training dataset.
15
+ The model is released for research purposes only, commercial use is not allowed.
16
+ </p>
17
+
18
+ Pretraining is done and if at all in future we are adding more question answer pairs, we can just do lora finetuning on top of this model
19
+
20
+ ## How to use
21
+
22
+
23
+ **Install dependencies**
24
+ ```bash
25
+ pip install transformers # latest version is ok, but we recommend v4.31.0
26
+ pip install -q pillow accelerate einops
27
+ ```
28
+
29
+ You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA).
30
+
31
+ ```Python
32
+ import torch
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ from PIL import Image
35
+
36
+ torch.set_default_device("cuda")
37
+
38
+ #Create model
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ "ManishThota/Sparrow",
41
+ torch_dtype=torch.float16,
42
+ device_map="auto",
43
+ trust_remote_code=True)
44
+ tokenizer = AutoTokenizer.from_pretrained("ManishThota/Sparrow", trust_remote_code=True)
45
+
46
+ #Set inputs
47
+ text = "A chat between a curious user and an artificial intelligence assistant. USER: <image>\nCan you explain the slide? ASSISTANT:"
48
+ image = Image.open("images/week_02_page_02")
49
+
50
+ input_ids = tokenizer(text, return_tensors='pt').input_ids
51
+ image_tensor = model.image_preprocess(image)
52
+
53
+ #Generate the answer
54
+ output_ids = model.generate(
55
+ input_ids,
56
+ max_new_tokens=1500,
57
+ images=image_tensor,
58
+ use_cache=True)[0]
59
+ print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
60
+ ```