File size: 3,250 Bytes
b3bfff1
 
9038f31
 
 
b3bfff1
 
f1dc21d
b3bfff1
f1dc21d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154e868
 
f1dc21d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200e967
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---

# OLMo 7B-Instruct-GGUF

> For more details on OLMO-7B-Instruct, refer to [Allen AI's OLMo-7B-Instruct model card](https://huggingface.co/allenai/OLMo-7B-Instruct).

OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
The OLMo base models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
The Instruct version is trained on the [cleaned version of the UltraFeedback dataset](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned).

OLMo 7B Instruct is trained for better question answering. They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques.

This version of the model is derived from [ssec-uw/OLMo-7B-Instruct-hf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-hf) as [GGUF format](https://huggingface.co/docs/hub/en/gguf),
a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes.

In addition to the model being in GGUF format, the model has been [quantized](https://huggingface.co/docs/optimum/en/concept_guides/quantization),
to reduce the computational and memory costs of running inference. *We are currently working on adding all of the [Quantization Types](https://huggingface.co/docs/hub/en/gguf#quantization-types)*.

These files are designed for use with [GGML](https://ggml.ai/) and executors based on GGML such as [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Get Started

To get started using one of the GGUF file, you can simply use [llama-cpp-python](https://github.com/abetlen/llama-cpp-python),
a Python binding for `llama.cpp`.

1. Install `llama-cpp-python` with pip.

    ```bash
    pip install llama-cpp-python
    ```

2. Download one of the GGUF file. In this example,
   we will download the [OLMo-7B-Instruct-Q4_K_M.gguf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-GGUF/resolve/main/OLMo-7B-Instruct-Q4_K_M.gguf?download=true),
   when the link is clicked.

3. Open up a python interpreter and run the following commands.
   For example, we can ask it: `What is a solar system?`
   
   *You will need to modify the `model_path` argument to where
   the GGUF model has been saved in your system*

    ```python
    from llama_cpp import Llama
    llm = Llama(
          model_path="path/to/OLMo-7B-Instruct-Q4_K_M.gguf"
    )
    result_dict = llm(prompt="What is solar system?", echo=True, max_tokens=500)
    print(result_dict['choices'][0]['text'])
    ```

5. That's it, you should see the result fairly quickly! Have fun! 🤖

## Contact

For errors in this model card, contact Don or Anant, {landungs, anmittal} at uw dot edu.

## Acknowledgement

We would like to thank the hardworking folks at [Allen AI](https://huggingface.co/allenai) for providing the original model.

Additionally, the work to convert and quantize the model was done by the
[University of Washington Scientific Software Engineering Center (SSEC)](https://escience.washington.edu/software-engineering/ssec/),
as part of the [Schmidt Futures Virtual Institute for Scientific Software (VISS)](https://www.schmidtsciences.org/viss/).