Qwen2-Audio-7B-GGUF / README.md
alanzhuly's picture
Update README.md
2575ff7 verified
|
raw
history blame
3.22 kB
---
license: apache-2.0
language:
- en
tags:
- audio-text-to-text
- chat
- audio
- GGUF
---
# Qwen2-Audio
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6618e0424dbef6bd3c72f89a%2FThcKJj7LcWCZPwN1So05f.png%26quot%3B%3C%2Fspan%3E alt="Example" style="width:700px;"/>
## We're bringing Qwen2-Audio to run locally on edge devices with Nexa-SDK, offering various GGUF quantization options.
Qwen2-Audio is a SOTA small-scale multimodal model (AudioLM) that handles audio and text inputs, allowing you to have voice interactions without ASR modules. Qwen2-Audio supports English, Chinese, and major European languages,and provides voice chat and audio analysis capabilities for local use cases like:
- Speaker identification and response
- Speech translation and transcription
- Mixed audio and noise detection
- Music and sound analysis
### Demo
<video controls autoplay src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6618e0424dbef6bd3c72f89a%2F02XDwJe3bhZHYptor-b2_.mp4%26quot%3B%3C%2Fspan%3E%26gt%3B%3C%2Fspan%3E%3C%2Fspan%3E%3Cspan class="language-xml"></video>
See more demos in our [blogs](https://nexa.ai/blogs/qwen2-audio)
## How to Run Locally On Device
In the following, we demonstrate how to run Qwen2-Audio locally on your device.
**Step 1: Install Nexa-SDK (local on-device inference framework)**
[Install Nexa-SDK](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-1-executable-installer)
> Nexa-SDK is a open-sourced, local on-device inference framework, supporting text generation, image generation, vision-language models (VLM), audio-language models, speech-to-text (ASR), and text-to-speech (TTS) capabilities. Installable via Python Package or Executable Installer.
**Step 2: Then run the following code in your terminal**
```bash
nexa run qwen2audio
```
This will run default q4_K_M quantization.
For terminal:
1. Drag and drop your audio file into the terminal (or enter file path on Linux)
2. Add text prompt to guide analysis or leave empty for direct voice input
**or to use with local UI (streamlit)**:
```bash
nexa run qwen2audio -st
```
## Choose Quantizations for your device
Run [different quantization versions here](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) and check RAM requirements in our list.
> The default q4_K_M version requires 4.2GB of RAM.
## Use Cases
### Voice Chat
- Answer daily questions
- Offer suggestions
- Speaker identification and response
- Speech translation
- Detecting background noise and responding accordingly
### Audio Analysis
- Information Extraction
- Audio summary
- Speech Transcription and Expansion
- Mixed audio and noise detection
- Music and sound analysis
## Performance Benchmark
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6618e0424dbef6bd3c72f89a%2Flax8bLpR5uK2_Za0G6G3j.png%26quot%3B%3C%2Fspan%3E alt="Example" style="width:700px;"/>
Results demonstrate that Qwen2-Audio significantly outperforms either previous SOTAs or Qwen-Audio across all tasks.
<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6618e0424dbef6bd3c72f89a%2F2vACK_gD_MAuZ7Hn4Yfiv.png%26quot%3B%3C%2Fspan%3E alt="Example" style="width:700px;"/>
## Blog
Learn more in our [blogs](https://nexa.ai/blogs/qwen2-audio)
## Join Community
[Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/nexa_ai)