Valerie v0.1 Model Card

Overview

Valerie v0.1 is a custom language model created using llama.cpp (commit: 532c173) with a context length of 256 tokens, embedding length of 256, 8 heads, and 16 layers. This model was pretrained on a dataset consisting of female V's dialog from Cyberpunk 2077, extracted using the Voice Over Subtitle Map mod.

Model Information

Full sampling

Model name Adam iteration Model filename Vocabulary size
Valerie v0.1 Checkpoint 1750 chk-valerie-v0.1-256x32-1750.gguf 32,000
Valerie v0.1 Model 1750 ggml-valerie-v0.1-256x32-f32-1750.gguf 32,000

The ggml-valerie-v0.1-256x32-f32-1750.gguf release represents a single epoch of all 51443 samples, completing over 1700 iterations over the entire dataset, and took approximately 3 hours for training.

Repeat sampling

Model name Adam iteration Model filename Vocabulary size
Valerie v0.1 Checkpoint 3600 chk-valerie-v0.1-256x32-LATEST.gguf 32,000
Valerie v0.1 Model 3600 ggml-valerie-v0.1-256x32-f32-LATEST.gguf 32,000

The ggml-valerie-v0.1-256x32-f32-LATEST.gguf release represents two epochs of all 51443 samples, completing over 3600 iterations over the entire dataset, and took approximately 6 hours for training.

Files and versions

  • ggml-vocab-mistral.gguf: Extracted Mistral 7B model vocabulary.
  • ggml-valerie-v0.1-256x32-f32-1750.gguf: The pretrained model checkpoint version 1750.
  • ggml-valerie-v0.1-256x32-f32-LATEST.gguf: The latest pretrained model checkpoint. Currently 3600.

Settings

  • Vocabulary size: 32,000
  • Context length: 256 tokens
  • Embedding length: 256
  • Heads: 8
  • Layers: 16
  • Batch size: 32
  • Seed: 1
  • Saved checkpoint every 50 iterations

Usage

To use Valerie v0.1, follow these steps:

  1. Clone the llama.cpp library
git clone https://github.com/ggerganov/llama.cpp

Reference the llama.cpp README.md for more information about building. You can build using raw CPU or even OpenBLAS. CUDA, ROCm, Vulkan, and other backends are also available.

Arch Linux Example:

# CPU build using BLAS backend on Arch Linux
sudo pacman -S openblas openblas64
make LLAMA_OPENBLAS=1
  1. Download the latest model.
wget https://huggingface.co/teleprint-me/cyberpunk-valerie-v0.1/resolve/main/ggml-valerie-v0.1-256x32-f32-LATEST.gguf?download=true -O 
ggml-valerie-v0.1-256x32-f32-LATEST.gguf

This will download the latest available base model.

  1. Perform inference with the latest model checkpoint using the provided command:
./main -m models/valerie/v0.1/ggml-valerie-v0.1-256x32-f32-LATEST.gguf --color -e -s 1 -c 4096

Benchmarks

Performance metrics for evaluating v0.1 iteration 3600 on CPU, BLAS, and Vulkan backends.

llama-bench

model size params backend threads test t/s
llama ?B all F32 114.53 MiB 30.02 M CPU 8 pp 512 12781.37 ± 2258.61
llama ?B all F32 114.53 MiB 30.02 M CPU 8 tg 128 410.74 ± 6.13
llama ?B all F32 114.53 MiB 30.02 M BLAS 8 pp 512 233.53 ± 1.56
llama ?B all F32 114.53 MiB 30.02 M BLAS 8 tg 128 391.63 ± 14.02
llama ?B all F32 114.53 MiB 30.02 M Vulkan 99 pp 512 18779.40 ± 111.01
llama ?B all F32 114.53 MiB 30.02 M Vulkan 99 tg 128 96.25 ± 0.46

build: ab0dee5 (2686)

batched-bench - CPU

PP TG B N_KV T_PP s S_PP t/s T_TG s S_TG t/s T s S t/s
128 128 1 256 0.009 14365.88 0.345 370.86 0.354 723.06
128 128 2 512 0.022 11514.42 0.377 679.29 0.399 1282.90
128 128 4 1024 0.052 9811.44 0.438 1168.69 0.490 2088.60
128 128 8 2048 0.093 11067.40 0.745 1373.82 0.838 2444.24
128 256 1 384 0.011 11861.74 0.705 363.37 0.715 536.83
128 256 2 768 0.022 11649.60 0.768 666.97 0.790 972.62
128 256 4 1536 0.050 10252.10 0.912 1122.94 0.962 1596.95
256 128 1 384 0.021 12028.94 0.345 370.85 0.366 1047.94
256 128 2 768 0.049 10351.80 0.404 633.82 0.453 1694.02
256 128 4 1536 0.118 8688.72 0.484 1058.15 0.602 2552.70
256 256 1 512 0.022 11477.76 0.715 357.83 0.738 694.02
256 256 2 1024 0.050 10263.61 0.822 622.72 0.872 1174.20
256 256 4 2048 0.092 11089.45 0.990 1033.97 1.083 1891.58
512 128 1 640 0.050 10235.70 0.372 344.35 0.422 1517.52
512 128 2 1280 0.093 10987.83 0.445 575.12 0.538 2377.77
512 256 1 768 0.050 10208.56 0.783 326.97 0.833 921.85
512 256 2 1536 0.091 11216.51 0.925 553.26 1.017 1510.73

main: n_kv_max = 2048, n_batch = 2048, n_ubatch = 512, is_pp_shared = 0, n_gpu_layers = 999, n_threads = 8, n_threads_batch = 8

Citations

When using Valerie v0.1 in your research, please remember to cite the following:

Contributors

Austin (teleprint-me) - Created and trained Valerie v0.1 using llama.cpp and the referenced dataset.

Community

Join the community of fellow language model enthusiasts and researchers by sharing your knowledge, asking questions, and collaborating on projects related to creating custom models using llama.cpp.

License

Valerie v0.1 is released under the CC-BY-NC-SA-3.0 license. You are free to use, modify, and redistribute this model for non-commercial purposes, but you must provide attribution to the original authors and release any derived works under the same license.

Downloads last month
13
GGUF
Model size
90.1M params
Architecture
llama

32-bit

Inference Examples
Inference API (serverless) has been turned off for this model.