|
--- |
|
language: |
|
- en |
|
model_creator: SpectraSuite |
|
quantized_by: jartine |
|
pipeline_tag: text-generation |
|
license: apache-2.0 |
|
license_link: LICENSE |
|
tags: |
|
- llamafile |
|
--- |
|
|
|
# TriLM - llamafile |
|
|
|
This is a 1.58 bit ternary LLM whose weights consist of {-1, 0, +1}. |
|
It's highly optimized for CPU performance, thanks to the [`Q2_K_S` |
|
quantization |
|
format](https://github.com/Mozilla-Ocho/llamafile/pull/552). |
|
|
|
- Model creator: [SpectraSuite](https://huggingface.co/SpectraSuite) |
|
- Original model: [TriLMs-Unpacked](https://huggingface.co/collections/SpectraSuite/trilms-unpacked-668d5f62afe0f4036925b1d2) |
|
|
|
This repository packages and distributes TriLM as executable weights, |
|
which we call [llamafiles](https://github.com/Mozilla-Ocho/llamafile). |
|
The files you download here will run on Linux, MacOS, Windows, FreeBSD, |
|
OpenBSD, and NetBSD for AMD64 and ARM64. |
|
|
|
## Quickstart |
|
|
|
Running the following on a desktop OS will launch a tab in your web |
|
browser with a completions interface. |
|
|
|
``` |
|
wget https://huggingface.co/Mozilla/TriLM-llamafile/resolve/main/TriLM_3.9B.llamafile |
|
chmod +x TriLM_3.9B.llamafile |
|
./TriLM_3.9B.llamafile |
|
``` |
|
|
|
You can also use the command line interface: |
|
|
|
``` |
|
./TriLM_3.9B.llamafile -p "this is my prompt" |
|
``` |
|
|
|
For further information, please see the [llamafile |
|
README](https://github.com/mozilla-ocho/llamafile/). |
|
|
|
Having **trouble?** See the ["Gotchas" |
|
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting) |
|
of the README. |
|
|
|
## Prompting |
|
|
|
This is a base model. It hasn't been fine-tuned for chat. It's |
|
recommended that the completions interface be used. |
|
|
|
It's recommended with the smaller TriLM models (e.g. 99M) that a high |
|
repeat penalty be set, e.g. `--repeat-penalty 10`. When using the CLI |
|
mode, this flag is specified by default in the `.args` file embedded |
|
within the llamafiles from this repo. |
|
|
|
## Benchmarks |
|
|
|
| cpu\_info | model\_filename | size | test | t/s | |
|
| :----------------------------------------- | :--------------------------------------- | ---------: | ------------: | --------------: | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 1069.54 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 88.47 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 1441.04 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 110.80 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 2185.94 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 154.59 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 2692.87 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 173.08 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 3353.51 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 191.98 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 4297.08 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 209.57 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 5130.90 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 221.88 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 5127.00 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 218.93 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 10874.11 | |
|
| AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 334.45 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 227.95 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 65.17 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 347.93 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 48.26 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 588.86 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 140.22 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 767.47 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 167.80 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 1031.20 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 204.46 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 1487.29 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 245.53 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 2049.02 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 332.24 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 2103.34 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 301.31 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 4762.49 | |
|
| Apple M2 Ultra (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 553.83 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 167.15 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 53.22 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 261.73 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 78.39 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 426.17 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 123.91 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 563.58 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 159.13 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 763.27 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 209.42 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 1116.30 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 295.71 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 1586.69 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 377.50 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 1587.38 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 401.37 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 3713.16 | |
|
| Intel Core i9-14900K (alderlake) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 845.54 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 17.02 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 6.67 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 26.35 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 10.52 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 42.52 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 16.91 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 56.57 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 20.54 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 146.67 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 56.77 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 147.65 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 58.24 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 338.42 | |
|
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 107.33 | |
|
|
|
## About llamafile |
|
|
|
llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. |
|
It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp |
|
binaries that run on the stock installs of six OSes for both ARM64 and |
|
AMD64. |
|
|
|
--- |
|
|
|
# TriLM 3.9B Unpacked |
|
|
|
TriLM (ternary model), unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, TriLM has the same architecture as LLaMa. |
|
|
|
```python |
|
import transformers as tf, torch |
|
model_name = "SpectraSuite/TriLM_3.9B_Unpacked" |
|
|
|
# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs. |
|
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto") |
|
|
|
# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly. |
|
pipeline("Once upon a time") |
|
``` |
|
|
|
* License: Apache 2.0 |
|
* We will use our GitHub repo for communication (including HF repo related queries). Feel free to open an issue here https://github.com/NolanoOrg/SpectraSuite |
|
|