File size: 1,190 Bytes
f3e2081
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: other
license_name: nvclv1
license_link: LICENSE
datasets:
- ILSVRC/imagenet-1k
pipeline_tag: image-classification
---


[**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).

### Model Overview

We introduce a novel mixer block by creating a symmetric path without SSM to enhance the modeling of global context. MambaVision has a hierarchical architecture that employs both self-attention and mixer blocks. 


### Model Performance

MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
terms of Top-1 accuracy and throughput. 

<p align="center">
<img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=42% height=42% 
class="center">
</p>


### Model Usage

You must first login into HuggingFace to pull the model:

```Bash
huggingface-cli login
```

The model can be simply used according to:

```Python
access_token = "<YOUR ACCESS TOKEN"
model = AutoModel.from_pretrained("nvidia/MambaVision-S-1K", trust_remote_code=True)
```


### License: 

[NVIDIA Source Code License-NC](https://huggingface.co/nvidia/MambaVision-S-1K/blob/main/LICENSE)