|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# OLMo 7B-Instruct-GGUF |
|
|
|
> For more details on OLMO-7B-Instruct, refer to [Allen AI's OLMo-7B-Instruct model card](https://huggingface.co/allenai/OLMo-7B-Instruct). |
|
|
|
OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models. |
|
The OLMo base models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset. |
|
The Instruct version is trained on the [cleaned version of the UltraFeedback dataset](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned). |
|
|
|
OLMo 7B Instruct is trained for better question answering. They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques. |
|
|
|
This version of the model is derived from [ssec-uw/OLMo-7B-Instruct-hf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-hf) as [GGUF format](https://huggingface.co/docs/hub/en/gguf), |
|
a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes. |
|
|
|
In addition to the model being in GGUF format, the model has been [quantized](https://huggingface.co/docs/optimum/en/concept_guides/quantization), |
|
to reduce the computational and memory costs of running inference. *We are currently working on adding all of the [Quantization Types](https://huggingface.co/docs/hub/en/gguf#quantization-types)*. |
|
|
|
These files are designed for use with [GGML](https://ggml.ai/) and executors based on GGML such as [llama.cpp](https://github.com/ggerganov/llama.cpp). |
|
|
|
## Get Started |
|
|
|
To get started using one of the GGUF file, you can simply use [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), |
|
a Python binding for `llama.cpp`. |
|
|
|
1. Install `llama-cpp-python` of at least `v0.2.70` with pip. |
|
The following command will install a pre-built wheel with basic CPU support. |
|
For other installation methods, see [llama-cpp-python installation docs](https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation). |
|
|
|
```bash |
|
pip install llama-cpp-python>=0.2.70 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu |
|
``` |
|
|
|
3. Download one of the GGUF file. In this example, |
|
we will download the [OLMo-7B-Instruct-Q4_K_M.gguf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-GGUF/resolve/main/OLMo-7B-Instruct-Q4_K_M.gguf?download=true), |
|
when the link is clicked. |
|
|
|
4. Open up a python interpreter and run the following commands. |
|
For example, we can ask it: `What is a solar system?` |
|
|
|
*You will need to modify the `model_path` argument to where |
|
the GGUF model has been saved in your system* |
|
|
|
```python |
|
from llama_cpp import Llama |
|
llm = Llama( |
|
model_path="path/to/OLMo-7B-Instruct-Q4_K_M.gguf" |
|
) |
|
result_dict = llm.create_chat_completion( |
|
messages = [ |
|
{ |
|
"role": "user", |
|
"content": "What is a solar system?" |
|
} |
|
] |
|
) |
|
print(result_dict['choices'][0]['message']['content']) |
|
``` |
|
|
|
5. That's it, you should see the result fairly quickly! Have fun! 🤖 |
|
|
|
## Contact |
|
|
|
For errors in this model card, contact Don or Anant, {landungs, anmittal} at uw dot edu. |
|
|
|
## Acknowledgement |
|
|
|
We would like to thank the hardworking folks at [Allen AI](https://huggingface.co/allenai) for providing the original model. |
|
|
|
Additionally, the work to convert and quantize the model was done by the |
|
[University of Washington Scientific Software Engineering Center (SSEC)](https://escience.washington.edu/software-engineering/ssec/), |
|
as part of the [Schmidt Sciences Virtual Institute for Scientific Software (VISS)](https://www.schmidtsciences.org/viss/). |
|
|