mofosyne commited on
Commit
ecb359f
·
1 Parent(s): a9a7525

add example output for easier documentation for those trying to replicate

Browse files
README.md CHANGED
@@ -14,8 +14,6 @@ tags:
14
  - Model creator: [Maykeye](https://huggingface.co/Maykeye)
15
  - Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
16
 
17
- If interested in the internal content of this model you can check [Tinyllama-4.6M-v0.0-F16.dump.md](./Tinyllama-4.6M-v0.0-F16.dump.md) included in this repo.
18
-
19
  ## Description
20
 
21
  * This repo is targeted towards:
@@ -54,7 +52,7 @@ chmod +x Tinyllama-5M-v0.2-F16.llamafile
54
 
55
  ## About llamafile
56
 
57
- llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp binaries that run on the stock installs of six OSes for both ARM64 and AMD64.
58
 
59
  ## Replication Steps Assumption
60
 
@@ -66,240 +64,6 @@ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses C
66
 
67
  For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo.
68
 
69
- ```
70
- $ ./llamafile-creation.sh
71
- == Prep Enviroment ==
72
- == Build and prep the llamafile engine execuable ==
73
- ~/huggingface/TinyLLama-v0-5M-F16-llamafile/llamafile ~/huggingface/TinyLLama-v0-5M-F16-llamafile
74
- make: Nothing to be done for 'all'.
75
- make: Nothing to be done for 'all'.
76
- ~/huggingface/TinyLLama-v0-5M-F16-llamafile
77
- == What is our llamafile name going to be? ==
78
- maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
79
- We will be aiming to generate Tinyllama-4.6M-v0.0-F16.llamafile
80
- == Convert from safetensor to gguf ==
81
- INFO:hf-to-gguf:Loading model: maykeye_tinyllama
82
- INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
83
- INFO:hf-to-gguf:Exporting model...
84
- INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
85
- INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
86
- INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
87
- INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
88
- INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
89
- INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
90
- INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
91
- INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
92
- INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
93
- INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
94
- INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
95
- INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
96
- INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
97
- INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
98
- INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
99
- INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
100
- INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
101
- INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
102
- INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
103
- INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
104
- INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
105
- INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
106
- INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
107
- INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
108
- INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
109
- INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
110
- INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
111
- INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
112
- INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
113
- INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
114
- INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
115
- INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
116
- INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
117
- INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
118
- INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
119
- INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
120
- INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
121
- INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
122
- INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
123
- INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
124
- INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
125
- INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
126
- INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
127
- INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
128
- INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
129
- INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
130
- INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
131
- INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
132
- INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
133
- INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
134
- INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
135
- INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
136
- INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
137
- INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
138
- INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
139
- INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
140
- INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
141
- INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
142
- INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
143
- INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
144
- INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
145
- INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
146
- INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
147
- INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
148
- INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
149
- INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
150
- INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
151
- INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
152
- INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
153
- INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
154
- INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
155
- INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
156
- INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
157
- INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
158
- INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
159
- INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
160
- INFO:hf-to-gguf:Set meta model
161
- INFO:hf-to-gguf:Set model parameters
162
- INFO:hf-to-gguf:gguf: context length = 2048
163
- INFO:hf-to-gguf:gguf: embedding length = 64
164
- INFO:hf-to-gguf:gguf: feed forward length = 256
165
- INFO:hf-to-gguf:gguf: head count = 16
166
- INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
167
- INFO:hf-to-gguf:gguf: file type = 1
168
- INFO:hf-to-gguf:Set model tokenizer
169
- INFO:gguf.vocab:Setting special token type bos to 1
170
- INFO:gguf.vocab:Setting special token type eos to 2
171
- INFO:gguf.vocab:Setting special token type unk to 0
172
- INFO:gguf.vocab:Setting special token type pad to 0
173
- INFO:hf-to-gguf:Set model quantization version
174
- INFO:gguf.gguf_writer:Writing the following files:
175
- INFO:gguf.gguf_writer:maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
176
- Writing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.24M/9.24M [00:00<00:00, 83.7Mbyte/s]
177
- INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/Tinyllama-4.6M-v0.0-F16.gguf
178
- == Generating Llamafile ==
179
- == Test Output ./Tinyllama-4.6M-v0.0-F16.llamafile ==
180
- note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
181
- main: llamafile version 0.8.9
182
- main: seed = 1721461448
183
- llama_model_loader: loaded meta data with 33 key-value pairs and 75 tensors from Tinyllama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
184
- llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
185
- llama_model_loader: - kv 0: general.architecture str = llama
186
- llama_model_loader: - kv 1: general.type str = model
187
- llama_model_loader: - kv 2: general.name str = TinyLLama
188
- llama_model_loader: - kv 3: general.author str = Maykeye
189
- llama_model_loader: - kv 4: general.version str = v0.0
190
- llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
191
- llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
192
- llama_model_loader: - kv 7: general.size_label str = 4.6M
193
- llama_model_loader: - kv 8: general.license str = apache-2.0
194
- llama_model_loader: - kv 9: general.url str = https://huggingface.co/mofosyne/TinyL...
195
- llama_model_loader: - kv 10: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
196
- llama_model_loader: - kv 11: general.tags arr[str,5] = ["text generation", "transformer", "l...
197
- llama_model_loader: - kv 12: general.languages arr[str,1] = ["en"]
198
- llama_model_loader: - kv 13: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
199
- llama_model_loader: - kv 14: llama.block_count u32 = 8
200
- llama_model_loader: - kv 15: llama.context_length u32 = 2048
201
- llama_model_loader: - kv 16: llama.embedding_length u32 = 64
202
- llama_model_loader: - kv 17: llama.feed_forward_length u32 = 256
203
- llama_model_loader: - kv 18: llama.attention.head_count u32 = 16
204
- llama_model_loader: - kv 19: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
205
- llama_model_loader: - kv 20: general.file_type u32 = 1
206
- llama_model_loader: - kv 21: llama.vocab_size u32 = 32000
207
- llama_model_loader: - kv 22: llama.rope.dimension_count u32 = 4
208
- llama_model_loader: - kv 23: tokenizer.ggml.model str = llama
209
- llama_model_loader: - kv 24: tokenizer.ggml.pre str = default
210
- llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
211
- llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
212
- llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
213
- llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1
214
- llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 2
215
- llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0
216
- llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 0
217
- llama_model_loader: - kv 32: general.quantization_version u32 = 2
218
- llama_model_loader: - type f32: 17 tensors
219
- llama_model_loader: - type f16: 58 tensors
220
- llm_load_vocab: special tokens definition check successful ( 259/32000 ).
221
- llm_load_print_meta: format = GGUF V3 (latest)
222
- llm_load_print_meta: arch = llama
223
- llm_load_print_meta: vocab type = SPM
224
- llm_load_print_meta: n_vocab = 32000
225
- llm_load_print_meta: n_merges = 0
226
- llm_load_print_meta: n_ctx_train = 2048
227
- llm_load_print_meta: n_embd = 64
228
- llm_load_print_meta: n_head = 16
229
- llm_load_print_meta: n_head_kv = 16
230
- llm_load_print_meta: n_layer = 8
231
- llm_load_print_meta: n_rot = 4
232
- llm_load_print_meta: n_swa = 0
233
- llm_load_print_meta: n_embd_head_k = 4
234
- llm_load_print_meta: n_embd_head_v = 4
235
- llm_load_print_meta: n_gqa = 1
236
- llm_load_print_meta: n_embd_k_gqa = 64
237
- llm_load_print_meta: n_embd_v_gqa = 64
238
- llm_load_print_meta: f_norm_eps = 0.0e+00
239
- llm_load_print_meta: f_norm_rms_eps = 1.0e-06
240
- llm_load_print_meta: f_clamp_kqv = 0.0e+00
241
- llm_load_print_meta: f_max_alibi_bias = 0.0e+00
242
- llm_load_print_meta: f_logit_scale = 0.0e+00
243
- llm_load_print_meta: n_ff = 256
244
- llm_load_print_meta: n_expert = 0
245
- llm_load_print_meta: n_expert_used = 0
246
- llm_load_print_meta: causal attn = 1
247
- llm_load_print_meta: pooling type = 0
248
- llm_load_print_meta: rope type = 0
249
- llm_load_print_meta: rope scaling = linear
250
- llm_load_print_meta: freq_base_train = 10000.0
251
- llm_load_print_meta: freq_scale_train = 1
252
- llm_load_print_meta: n_yarn_orig_ctx = 2048
253
- llm_load_print_meta: rope_finetuned = unknown
254
- llm_load_print_meta: ssm_d_conv = 0
255
- llm_load_print_meta: ssm_d_inner = 0
256
- llm_load_print_meta: ssm_d_state = 0
257
- llm_load_print_meta: ssm_dt_rank = 0
258
- llm_load_print_meta: model type = ?B
259
- llm_load_print_meta: model ftype = F16
260
- llm_load_print_meta: model params = 4.62 M
261
- llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
262
- llm_load_print_meta: general.name = TinyLLama
263
- llm_load_print_meta: BOS token = 1 '<s>'
264
- llm_load_print_meta: EOS token = 2 '</s>'
265
- llm_load_print_meta: UNK token = 0 '<unk>'
266
- llm_load_print_meta: PAD token = 0 '<unk>'
267
- llm_load_print_meta: LF token = 13 '<0x0A>'
268
- llm_load_tensors: ggml ctx size = 0.04 MiB
269
- llm_load_tensors: CPU buffer size = 8.82 MiB
270
- ..............
271
- llama_new_context_with_model: n_ctx = 512
272
- llama_new_context_with_model: n_batch = 512
273
- llama_new_context_with_model: n_ubatch = 512
274
- llama_new_context_with_model: flash_attn = 0
275
- llama_new_context_with_model: freq_base = 10000.0
276
- llama_new_context_with_model: freq_scale = 1
277
- llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
278
- llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
279
- llama_new_context_with_model: CPU output buffer size = 0.12 MiB
280
- llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
281
- llama_new_context_with_model: graph nodes = 262
282
- llama_new_context_with_model: graph splits = 1
283
-
284
- system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
285
- sampling:
286
- repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
287
- top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
288
- mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
289
- sampling order:
290
- CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
291
- generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
292
-
293
 
294
- hello world the gruff man said no. The man was very sad and wanted to see what was wrong. He asked the man if they could do it. But they did not like his way to the park.
295
- One day, the man decided to go in and he took off his own new home. He gave the bird a little bit of his friend. He said he had to find a way to hide it in his woods. The man was very happy, but he knew he needed to make it in the yard.
296
- The man was very sad and he could not find the bird. He didn't want to get to the park and his friend was very sad. They could not find the bird and his friend. But the man was too sad. He had no friends and no friends. [end of text]
297
-
298
-
299
- llama_print_timings: load time = 10.26 ms
300
- llama_print_timings: sample time = 6.03 ms / 156 runs ( 0.04 ms per token, 25879.23 tokens per second)
301
- llama_print_timings: prompt eval time = 2.16 ms / 8 tokens ( 0.27 ms per token, 3696.86 tokens per second)
302
- llama_print_timings: eval time = 748.08 ms / 155 runs ( 4.83 ms per token, 207.20 tokens per second)
303
- llama_print_timings: total time = 800.80 ms / 163 tokens
304
- Log end
305
- ```
 
14
  - Model creator: [Maykeye](https://huggingface.co/Maykeye)
15
  - Original model: [TinyLLama-v0](https://huggingface.co/Maykeye/TinyLLama-v0)
16
 
 
 
17
  ## Description
18
 
19
  * This repo is targeted towards:
 
52
 
53
  ## About llamafile
54
 
55
+ [llamafile](https://github.com/Mozilla-Ocho/llamafile) is a new format introduced by Mozilla Ocho on Nov 20th 2023. It uses [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) to turn LLM weights into runnable [llama.cpp](https://github.com/ggerganov/llama.cpp) binaries that run on the stock installs of six OSes for both ARM64 and AMD64.
56
 
57
  ## Replication Steps Assumption
58
 
 
64
 
65
  For the most current replication steps, refer to the bash script `llamafile-creation.sh` in this repo.
66
 
67
+ You may want to also check these [output of convert_hf_to_gguf](convert_hf_to_gguf.output.txt) and [output of the generated llamafile](./llamafile_output_example.output.txt) to sanity check that you got the right process if you feel like your output doesn't quite make sense.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
+ If interested in the internal content of this model you can check [Tinyllama-4.6M-v0.0-F16.dump.md](./Tinyllama-4.6M-v0.0-F16.dump.md) included in this repo as well which would be helpful if you are curious about the structure of a gguf file.
 
 
 
 
 
 
 
 
 
 
 
convert_hf_to_gguf.output.txt ADDED
@@ -0,0 +1,582 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
2
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
3
+ INFO:hf-to-gguf:Exporting model...
4
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
5
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
6
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
7
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
8
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
9
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
10
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
11
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
12
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
13
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
14
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
15
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
16
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
17
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
18
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
19
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
20
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
21
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
22
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
23
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
24
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
25
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
26
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
27
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
28
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
29
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
30
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
31
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
32
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
33
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
34
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
35
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
36
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
37
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
38
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
39
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
40
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
41
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
42
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
43
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
44
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
45
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
46
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
47
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
48
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
49
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
50
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
51
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
52
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
53
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
54
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
55
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
56
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
57
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
58
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
59
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
60
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
61
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
62
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
63
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
64
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
65
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
66
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
67
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
68
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
69
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
70
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
71
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
72
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
73
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
74
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
75
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
76
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
77
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
78
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
79
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
80
+ INFO:hf-to-gguf:Set meta model
81
+ INFO:hf-to-gguf:Set model parameters
82
+ INFO:hf-to-gguf:gguf: context length = 2048
83
+ INFO:hf-to-gguf:gguf: embedding length = 64
84
+ INFO:hf-to-gguf:gguf: feed forward length = 256
85
+ INFO:hf-to-gguf:gguf: head count = 16
86
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
87
+ INFO:hf-to-gguf:gguf: file type = 1
88
+ INFO:hf-to-gguf:Set model tokenizer
89
+ INFO:gguf.vocab:Setting special token type bos to 1
90
+ INFO:gguf.vocab:Setting special token type eos to 2
91
+ INFO:gguf.vocab:Setting special token type unk to 0
92
+ INFO:gguf.vocab:Setting special token type pad to 0
93
+ INFO:hf-to-gguf:Set model quantization version
94
+ INFO:gguf.gguf_writer:Writing the following files:
95
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
96
+
97
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
98
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
99
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
100
+ INFO:hf-to-gguf:Exporting model...
101
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
102
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
103
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
104
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
105
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
106
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
107
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
108
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
109
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
110
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
111
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
112
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
113
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
114
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
115
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
116
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
117
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
118
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
119
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
120
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
121
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
122
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
123
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
124
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
125
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
126
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
127
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
128
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
129
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
130
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
131
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
132
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
133
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
134
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
135
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
136
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
137
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
138
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
139
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
140
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
141
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
142
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
143
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
144
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
145
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
146
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
147
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
148
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
149
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
150
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
151
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
152
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
153
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
154
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
155
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
156
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
157
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
158
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
159
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
160
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
161
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
162
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
163
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
164
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
165
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
166
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
167
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
168
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
169
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
170
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
171
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
172
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
173
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
174
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
175
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
176
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
177
+ INFO:hf-to-gguf:Set meta model
178
+ INFO:hf-to-gguf:Set model parameters
179
+ INFO:hf-to-gguf:gguf: context length = 2048
180
+ INFO:hf-to-gguf:gguf: embedding length = 64
181
+ INFO:hf-to-gguf:gguf: feed forward length = 256
182
+ INFO:hf-to-gguf:gguf: head count = 16
183
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
184
+ INFO:hf-to-gguf:gguf: file type = 1
185
+ INFO:hf-to-gguf:Set model tokenizer
186
+ INFO:gguf.vocab:Setting special token type bos to 1
187
+ INFO:gguf.vocab:Setting special token type eos to 2
188
+ INFO:gguf.vocab:Setting special token type unk to 0
189
+ INFO:gguf.vocab:Setting special token type pad to 0
190
+ INFO:hf-to-gguf:Set model quantization version
191
+ INFO:gguf.gguf_writer:Writing the following files:
192
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
193
+
194
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
195
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
196
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
197
+ INFO:hf-to-gguf:Exporting model...
198
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
199
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
200
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
201
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
202
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
203
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
204
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
205
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
206
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
207
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
208
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
209
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
210
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
211
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
212
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
213
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
214
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
215
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
216
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
217
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
218
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
219
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
220
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
221
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
222
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
223
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
224
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
225
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
226
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
227
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
228
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
229
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
230
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
231
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
232
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
233
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
234
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
235
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
236
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
237
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
238
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
239
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
240
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
241
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
242
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
243
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
244
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
245
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
246
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
247
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
248
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
249
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
250
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
251
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
252
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
253
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
254
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
255
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
256
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
257
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
258
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
259
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
260
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
261
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
262
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
263
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
264
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
265
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
266
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
267
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
268
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
269
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
270
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
271
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
272
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
273
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
274
+ INFO:hf-to-gguf:Set meta model
275
+ INFO:hf-to-gguf:Set model parameters
276
+ INFO:hf-to-gguf:gguf: context length = 2048
277
+ INFO:hf-to-gguf:gguf: embedding length = 64
278
+ INFO:hf-to-gguf:gguf: feed forward length = 256
279
+ INFO:hf-to-gguf:gguf: head count = 16
280
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
281
+ INFO:hf-to-gguf:gguf: file type = 1
282
+ INFO:hf-to-gguf:Set model tokenizer
283
+ INFO:gguf.vocab:Setting special token type bos to 1
284
+ INFO:gguf.vocab:Setting special token type eos to 2
285
+ INFO:gguf.vocab:Setting special token type unk to 0
286
+ INFO:gguf.vocab:Setting special token type pad to 0
287
+ INFO:hf-to-gguf:Set model quantization version
288
+ INFO:gguf.gguf_writer:Writing the following files:
289
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
290
+
291
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
292
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
293
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
294
+ INFO:hf-to-gguf:Exporting model...
295
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
296
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
297
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
298
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
299
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
300
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
301
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
302
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
303
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
304
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
305
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
306
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
307
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
308
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
309
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
310
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
311
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
312
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
313
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
314
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
315
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
316
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
317
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
318
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
319
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
320
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
321
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
322
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
323
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
324
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
325
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
326
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
327
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
328
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
329
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
330
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
331
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
332
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
333
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
334
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
335
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
336
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
337
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
338
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
339
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
340
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
341
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
342
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
343
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
344
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
345
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
346
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
347
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
348
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
349
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
350
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
351
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
352
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
353
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
354
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
355
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
356
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
357
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
358
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
359
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
360
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
361
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
362
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
363
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
364
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
365
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
366
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
367
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
368
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
369
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
370
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
371
+ INFO:hf-to-gguf:Set meta model
372
+ INFO:hf-to-gguf:Set model parameters
373
+ INFO:hf-to-gguf:gguf: context length = 2048
374
+ INFO:hf-to-gguf:gguf: embedding length = 64
375
+ INFO:hf-to-gguf:gguf: feed forward length = 256
376
+ INFO:hf-to-gguf:gguf: head count = 16
377
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
378
+ INFO:hf-to-gguf:gguf: file type = 1
379
+ INFO:hf-to-gguf:Set model tokenizer
380
+ INFO:gguf.vocab:Setting special token type bos to 1
381
+ INFO:gguf.vocab:Setting special token type eos to 2
382
+ INFO:gguf.vocab:Setting special token type unk to 0
383
+ INFO:gguf.vocab:Setting special token type pad to 0
384
+ INFO:hf-to-gguf:Set model quantization version
385
+ INFO:gguf.gguf_writer:Writing the following files:
386
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
387
+
388
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
389
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
390
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
391
+ INFO:hf-to-gguf:Exporting model...
392
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
393
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
394
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
395
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
396
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
397
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
398
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
399
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
400
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
401
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
402
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
403
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
404
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
405
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
406
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
407
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
408
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
409
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
410
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
411
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
412
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
413
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
414
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
415
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
416
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
417
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
418
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
419
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
420
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
421
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
422
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
423
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
424
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
425
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
426
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
427
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
428
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
429
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
430
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
431
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
432
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
433
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
434
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
435
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
436
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
437
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
438
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
439
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
440
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
441
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
442
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
443
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
444
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
445
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
446
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
447
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
448
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
449
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
450
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
451
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
452
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
453
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
454
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
455
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
456
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
457
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
458
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
459
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
460
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
461
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
462
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
463
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
464
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
465
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
466
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
467
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
468
+ INFO:hf-to-gguf:Set meta model
469
+ INFO:hf-to-gguf:Set model parameters
470
+ INFO:hf-to-gguf:gguf: context length = 2048
471
+ INFO:hf-to-gguf:gguf: embedding length = 64
472
+ INFO:hf-to-gguf:gguf: feed forward length = 256
473
+ INFO:hf-to-gguf:gguf: head count = 16
474
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
475
+ INFO:hf-to-gguf:gguf: file type = 1
476
+ INFO:hf-to-gguf:Set model tokenizer
477
+ INFO:gguf.vocab:Setting special token type bos to 1
478
+ INFO:gguf.vocab:Setting special token type eos to 2
479
+ INFO:gguf.vocab:Setting special token type unk to 0
480
+ INFO:gguf.vocab:Setting special token type pad to 0
481
+ INFO:hf-to-gguf:Set model quantization version
482
+ INFO:gguf.gguf_writer:Writing the following files:
483
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
484
+
485
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
486
+ INFO:hf-to-gguf:Loading model: maykeye_tinyllama
487
+ INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
488
+ INFO:hf-to-gguf:Exporting model...
489
+ INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
490
+ INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {64, 32000}
491
+ INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {64, 32000}
492
+ INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
493
+ INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
494
+ INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
495
+ INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
496
+ INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
497
+ INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
498
+ INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
499
+ INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
500
+ INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
501
+ INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
502
+ INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
503
+ INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
504
+ INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
505
+ INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
506
+ INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
507
+ INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
508
+ INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
509
+ INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
510
+ INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
511
+ INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
512
+ INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
513
+ INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
514
+ INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
515
+ INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
516
+ INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
517
+ INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
518
+ INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
519
+ INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
520
+ INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
521
+ INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
522
+ INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
523
+ INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
524
+ INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
525
+ INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
526
+ INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
527
+ INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
528
+ INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
529
+ INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
530
+ INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
531
+ INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
532
+ INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
533
+ INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
534
+ INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
535
+ INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
536
+ INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
537
+ INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
538
+ INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
539
+ INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
540
+ INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
541
+ INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
542
+ INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
543
+ INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
544
+ INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
545
+ INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
546
+ INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
547
+ INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
548
+ INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
549
+ INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
550
+ INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
551
+ INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
552
+ INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
553
+ INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
554
+ INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
555
+ INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {64}
556
+ INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {256, 64}
557
+ INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {64, 256}
558
+ INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {64, 256}
559
+ INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {64}
560
+ INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {64, 64}
561
+ INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {64, 64}
562
+ INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {64, 64}
563
+ INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {64, 64}
564
+ INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {64}
565
+ INFO:hf-to-gguf:Set meta model
566
+ INFO:hf-to-gguf:Set model parameters
567
+ INFO:hf-to-gguf:gguf: context length = 2048
568
+ INFO:hf-to-gguf:gguf: embedding length = 64
569
+ INFO:hf-to-gguf:gguf: feed forward length = 256
570
+ INFO:hf-to-gguf:gguf: head count = 16
571
+ INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
572
+ INFO:hf-to-gguf:gguf: file type = 1
573
+ INFO:hf-to-gguf:Set model tokenizer
574
+ INFO:gguf.vocab:Setting special token type bos to 1
575
+ INFO:gguf.vocab:Setting special token type eos to 2
576
+ INFO:gguf.vocab:Setting special token type unk to 0
577
+ INFO:gguf.vocab:Setting special token type pad to 0
578
+ INFO:hf-to-gguf:Set model quantization version
579
+ INFO:gguf.gguf_writer:Writing the following files:
580
+ INFO:gguf.gguf_writer:maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf: n_tensors = 75, total_size = 9.2M
581
+
582
+ INFO:hf-to-gguf:Model successfully exported to maykeye_tinyllama/TinyLLama-4.6M-v0.0-F16.gguf
llamafile-creation.sh CHANGED
@@ -34,7 +34,7 @@ echo We will be aiming to generate $OUTFILE.llamafile
34
 
35
  ###############################################################################
36
  echo == Convert from safetensor to gguf ==
37
- ./llama.cpp/convert_hf_to_gguf.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --verbose
38
  mv ${MODEL_DIR}/${OUTFILE}.gguf ${OUTFILE}.gguf
39
 
40
  # Generate Diagnostics Dumpfile
@@ -55,4 +55,4 @@ EOF
55
 
56
  ###############################################################################
57
  echo == Test Output ./${OUTFILE}.llamafile ==
58
- ./${OUTFILE}.llamafile --cli -p "hello world the gruff man said"
 
34
 
35
  ###############################################################################
36
  echo == Convert from safetensor to gguf ==
37
+ ./llama.cpp/convert_hf_to_gguf.py ${MODEL_DIR} --metadata ${METADATA_FILE} --outtype f16 --verbose &>> convert_hf_to_gguf.output.txt
38
  mv ${MODEL_DIR}/${OUTFILE}.gguf ${OUTFILE}.gguf
39
 
40
  # Generate Diagnostics Dumpfile
 
55
 
56
  ###############################################################################
57
  echo == Test Output ./${OUTFILE}.llamafile ==
58
+ ./${OUTFILE}.llamafile --cli -p "hello world the gruff man said" &>> llamafile_output_example.output.txt
llamafile_output_example.output.txt ADDED
@@ -0,0 +1,806 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
2
+ main: llamafile version 0.8.9
3
+ main: seed = 1721530767
4
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
5
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
6
+ llama_model_loader: - kv 0: general.architecture str = llama
7
+ llama_model_loader: - kv 1: general.type str = model
8
+ llama_model_loader: - kv 2: general.name str = TinyLLama
9
+ llama_model_loader: - kv 3: general.author str = Maykeye
10
+ llama_model_loader: - kv 4: general.version str = v0.0
11
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
12
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
13
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
14
+ llama_model_loader: - kv 8: general.license str = apache-2.0
15
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
16
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
17
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
18
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
19
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
20
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
21
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
22
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
23
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
24
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
25
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
26
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
27
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
28
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
29
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
30
+ llama_model_loader: - kv 24: general.file_type u32 = 1
31
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
32
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
33
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
34
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
35
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
36
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
37
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
38
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
39
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
40
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
41
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
42
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
43
+ llama_model_loader: - type f32: 17 tensors
44
+ llama_model_loader: - type f16: 58 tensors
45
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
46
+ llm_load_print_meta: format = GGUF V3 (latest)
47
+ llm_load_print_meta: arch = llama
48
+ llm_load_print_meta: vocab type = SPM
49
+ llm_load_print_meta: n_vocab = 32000
50
+ llm_load_print_meta: n_merges = 0
51
+ llm_load_print_meta: n_ctx_train = 2048
52
+ llm_load_print_meta: n_embd = 64
53
+ llm_load_print_meta: n_head = 16
54
+ llm_load_print_meta: n_head_kv = 16
55
+ llm_load_print_meta: n_layer = 8
56
+ llm_load_print_meta: n_rot = 4
57
+ llm_load_print_meta: n_swa = 0
58
+ llm_load_print_meta: n_embd_head_k = 4
59
+ llm_load_print_meta: n_embd_head_v = 4
60
+ llm_load_print_meta: n_gqa = 1
61
+ llm_load_print_meta: n_embd_k_gqa = 64
62
+ llm_load_print_meta: n_embd_v_gqa = 64
63
+ llm_load_print_meta: f_norm_eps = 0.0e+00
64
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
65
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
66
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
67
+ llm_load_print_meta: f_logit_scale = 0.0e+00
68
+ llm_load_print_meta: n_ff = 256
69
+ llm_load_print_meta: n_expert = 0
70
+ llm_load_print_meta: n_expert_used = 0
71
+ llm_load_print_meta: causal attn = 1
72
+ llm_load_print_meta: pooling type = 0
73
+ llm_load_print_meta: rope type = 0
74
+ llm_load_print_meta: rope scaling = linear
75
+ llm_load_print_meta: freq_base_train = 10000.0
76
+ llm_load_print_meta: freq_scale_train = 1
77
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
78
+ llm_load_print_meta: rope_finetuned = unknown
79
+ llm_load_print_meta: ssm_d_conv = 0
80
+ llm_load_print_meta: ssm_d_inner = 0
81
+ llm_load_print_meta: ssm_d_state = 0
82
+ llm_load_print_meta: ssm_dt_rank = 0
83
+ llm_load_print_meta: model type = ?B
84
+ llm_load_print_meta: model ftype = F16
85
+ llm_load_print_meta: model params = 4.62 M
86
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
87
+ llm_load_print_meta: general.name = TinyLLama
88
+ llm_load_print_meta: BOS token = 1 '<s>'
89
+ llm_load_print_meta: EOS token = 2 '</s>'
90
+ llm_load_print_meta: UNK token = 0 '<unk>'
91
+ llm_load_print_meta: PAD token = 0 '<unk>'
92
+ llm_load_print_meta: LF token = 13 '<0x0A>'
93
+ llm_load_tensors: ggml ctx size = 0.04 MiB
94
+ llm_load_tensors: CPU buffer size = 8.82 MiB
95
+ ..............
96
+ llama_new_context_with_model: n_ctx = 512
97
+ llama_new_context_with_model: n_batch = 512
98
+ llama_new_context_with_model: n_ubatch = 512
99
+ llama_new_context_with_model: flash_attn = 0
100
+ llama_new_context_with_model: freq_base = 10000.0
101
+ llama_new_context_with_model: freq_scale = 1
102
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
103
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
104
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
105
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
106
+ llama_new_context_with_model: graph nodes = 262
107
+ llama_new_context_with_model: graph splits = 1
108
+
109
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
110
+ sampling:
111
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
112
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
113
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
114
+ sampling order:
115
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
116
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
117
+
118
+
119
+ hello world the gruff man said they had a big heart. The man smiled and said he would be the best friends.
120
+ The man said no. He said he didn't want to take a nap and he was too sad.
121
+ The man was so sad he started to cry. He didn't know what the man was one of his friends.
122
+ The man saw that the man was not the children. He felt sad and said he wanted to come back. The man was mad and he was so sad.
123
+ The man felt bad for the people, and they both felt bad for the story.
124
+ The man told the kids about the end. He said he had to stay too slow and not take them. He was sad, but it was too late.
125
+ The people were very sad, but they knew it was not nice. They were never seen again. [end of text]
126
+
127
+
128
+ llama_print_timings: load time = 10.61 ms
129
+ llama_print_timings: sample time = 5.92 ms / 172 runs ( 0.03 ms per token, 29073.70 tokens per second)
130
+ llama_print_timings: prompt eval time = 2.03 ms / 8 tokens ( 0.25 ms per token, 3942.83 tokens per second)
131
+ llama_print_timings: eval time = 245.61 ms / 171 runs ( 1.44 ms per token, 696.21 tokens per second)
132
+ llama_print_timings: total time = 292.61 ms / 179 tokens
133
+ Log end
134
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
135
+ main: llamafile version 0.8.9
136
+ main: seed = 1721531043
137
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
138
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
139
+ llama_model_loader: - kv 0: general.architecture str = llama
140
+ llama_model_loader: - kv 1: general.type str = model
141
+ llama_model_loader: - kv 2: general.name str = TinyLLama
142
+ llama_model_loader: - kv 3: general.author str = Maykeye
143
+ llama_model_loader: - kv 4: general.version str = v0.0
144
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
145
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
146
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
147
+ llama_model_loader: - kv 8: general.license str = apache-2.0
148
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
149
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
150
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
151
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
152
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
153
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
154
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
155
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
156
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
157
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
158
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
159
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
160
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
161
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
162
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
163
+ llama_model_loader: - kv 24: general.file_type u32 = 1
164
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
165
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
166
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
167
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
168
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
169
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
170
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
171
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
172
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
173
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
174
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
175
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
176
+ llama_model_loader: - type f32: 17 tensors
177
+ llama_model_loader: - type f16: 58 tensors
178
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
179
+ llm_load_print_meta: format = GGUF V3 (latest)
180
+ llm_load_print_meta: arch = llama
181
+ llm_load_print_meta: vocab type = SPM
182
+ llm_load_print_meta: n_vocab = 32000
183
+ llm_load_print_meta: n_merges = 0
184
+ llm_load_print_meta: n_ctx_train = 2048
185
+ llm_load_print_meta: n_embd = 64
186
+ llm_load_print_meta: n_head = 16
187
+ llm_load_print_meta: n_head_kv = 16
188
+ llm_load_print_meta: n_layer = 8
189
+ llm_load_print_meta: n_rot = 4
190
+ llm_load_print_meta: n_swa = 0
191
+ llm_load_print_meta: n_embd_head_k = 4
192
+ llm_load_print_meta: n_embd_head_v = 4
193
+ llm_load_print_meta: n_gqa = 1
194
+ llm_load_print_meta: n_embd_k_gqa = 64
195
+ llm_load_print_meta: n_embd_v_gqa = 64
196
+ llm_load_print_meta: f_norm_eps = 0.0e+00
197
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
198
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
199
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
200
+ llm_load_print_meta: f_logit_scale = 0.0e+00
201
+ llm_load_print_meta: n_ff = 256
202
+ llm_load_print_meta: n_expert = 0
203
+ llm_load_print_meta: n_expert_used = 0
204
+ llm_load_print_meta: causal attn = 1
205
+ llm_load_print_meta: pooling type = 0
206
+ llm_load_print_meta: rope type = 0
207
+ llm_load_print_meta: rope scaling = linear
208
+ llm_load_print_meta: freq_base_train = 10000.0
209
+ llm_load_print_meta: freq_scale_train = 1
210
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
211
+ llm_load_print_meta: rope_finetuned = unknown
212
+ llm_load_print_meta: ssm_d_conv = 0
213
+ llm_load_print_meta: ssm_d_inner = 0
214
+ llm_load_print_meta: ssm_d_state = 0
215
+ llm_load_print_meta: ssm_dt_rank = 0
216
+ llm_load_print_meta: model type = ?B
217
+ llm_load_print_meta: model ftype = F16
218
+ llm_load_print_meta: model params = 4.62 M
219
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
220
+ llm_load_print_meta: general.name = TinyLLama
221
+ llm_load_print_meta: BOS token = 1 '<s>'
222
+ llm_load_print_meta: EOS token = 2 '</s>'
223
+ llm_load_print_meta: UNK token = 0 '<unk>'
224
+ llm_load_print_meta: PAD token = 0 '<unk>'
225
+ llm_load_print_meta: LF token = 13 '<0x0A>'
226
+ llm_load_tensors: ggml ctx size = 0.04 MiB
227
+ llm_load_tensors: CPU buffer size = 8.82 MiB
228
+ ..............
229
+ llama_new_context_with_model: n_ctx = 512
230
+ llama_new_context_with_model: n_batch = 512
231
+ llama_new_context_with_model: n_ubatch = 512
232
+ llama_new_context_with_model: flash_attn = 0
233
+ llama_new_context_with_model: freq_base = 10000.0
234
+ llama_new_context_with_model: freq_scale = 1
235
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
236
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
237
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
238
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
239
+ llama_new_context_with_model: graph nodes = 262
240
+ llama_new_context_with_model: graph splits = 1
241
+
242
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
243
+ sampling:
244
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
245
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
246
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
247
+ sampling order:
248
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
249
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
250
+
251
+
252
+ hello world the gruff man said he had a dream. He thought about what he had to do. He took a deep breath and ran around the woods. The man was so excited! He couldn't wait to do the whole day.
253
+ The man looked around and found a small box. He wanted to see what was inside. He picked up the box and started to climb. But the box was too hard. He looked around for the cake.
254
+ The man was sad and he didn't know what to do. He asked his friend to help him. The man said no. He said he couldn't get the oven. He was sad and began to cry.
255
+ The man felt bad because he knew it was okay. He wished he had been a good friend. He was very sad and he couldn't find the cake. [end of text]
256
+
257
+
258
+ llama_print_timings: load time = 6.74 ms
259
+ llama_print_timings: sample time = 6.76 ms / 169 runs ( 0.04 ms per token, 25011.10 tokens per second)
260
+ llama_print_timings: prompt eval time = 3.64 ms / 8 tokens ( 0.46 ms per token, 2196.60 tokens per second)
261
+ llama_print_timings: eval time = 340.85 ms / 168 runs ( 2.03 ms per token, 492.88 tokens per second)
262
+ llama_print_timings: total time = 392.63 ms / 176 tokens
263
+ Log end
264
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
265
+ main: llamafile version 0.8.9
266
+ main: seed = 1721531082
267
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
268
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
269
+ llama_model_loader: - kv 0: general.architecture str = llama
270
+ llama_model_loader: - kv 1: general.type str = model
271
+ llama_model_loader: - kv 2: general.name str = TinyLLama
272
+ llama_model_loader: - kv 3: general.author str = Maykeye
273
+ llama_model_loader: - kv 4: general.version str = v0.0
274
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
275
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
276
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
277
+ llama_model_loader: - kv 8: general.license str = apache-2.0
278
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
279
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
280
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
281
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
282
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
283
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
284
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
285
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
286
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
287
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
288
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
289
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
290
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
291
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
292
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
293
+ llama_model_loader: - kv 24: general.file_type u32 = 1
294
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
295
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
296
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
297
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
298
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
299
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
300
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
301
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
302
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
303
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
304
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
305
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
306
+ llama_model_loader: - type f32: 17 tensors
307
+ llama_model_loader: - type f16: 58 tensors
308
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
309
+ llm_load_print_meta: format = GGUF V3 (latest)
310
+ llm_load_print_meta: arch = llama
311
+ llm_load_print_meta: vocab type = SPM
312
+ llm_load_print_meta: n_vocab = 32000
313
+ llm_load_print_meta: n_merges = 0
314
+ llm_load_print_meta: n_ctx_train = 2048
315
+ llm_load_print_meta: n_embd = 64
316
+ llm_load_print_meta: n_head = 16
317
+ llm_load_print_meta: n_head_kv = 16
318
+ llm_load_print_meta: n_layer = 8
319
+ llm_load_print_meta: n_rot = 4
320
+ llm_load_print_meta: n_swa = 0
321
+ llm_load_print_meta: n_embd_head_k = 4
322
+ llm_load_print_meta: n_embd_head_v = 4
323
+ llm_load_print_meta: n_gqa = 1
324
+ llm_load_print_meta: n_embd_k_gqa = 64
325
+ llm_load_print_meta: n_embd_v_gqa = 64
326
+ llm_load_print_meta: f_norm_eps = 0.0e+00
327
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
328
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
329
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
330
+ llm_load_print_meta: f_logit_scale = 0.0e+00
331
+ llm_load_print_meta: n_ff = 256
332
+ llm_load_print_meta: n_expert = 0
333
+ llm_load_print_meta: n_expert_used = 0
334
+ llm_load_print_meta: causal attn = 1
335
+ llm_load_print_meta: pooling type = 0
336
+ llm_load_print_meta: rope type = 0
337
+ llm_load_print_meta: rope scaling = linear
338
+ llm_load_print_meta: freq_base_train = 10000.0
339
+ llm_load_print_meta: freq_scale_train = 1
340
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
341
+ llm_load_print_meta: rope_finetuned = unknown
342
+ llm_load_print_meta: ssm_d_conv = 0
343
+ llm_load_print_meta: ssm_d_inner = 0
344
+ llm_load_print_meta: ssm_d_state = 0
345
+ llm_load_print_meta: ssm_dt_rank = 0
346
+ llm_load_print_meta: model type = ?B
347
+ llm_load_print_meta: model ftype = F16
348
+ llm_load_print_meta: model params = 4.62 M
349
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
350
+ llm_load_print_meta: general.name = TinyLLama
351
+ llm_load_print_meta: BOS token = 1 '<s>'
352
+ llm_load_print_meta: EOS token = 2 '</s>'
353
+ llm_load_print_meta: UNK token = 0 '<unk>'
354
+ llm_load_print_meta: PAD token = 0 '<unk>'
355
+ llm_load_print_meta: LF token = 13 '<0x0A>'
356
+ llm_load_tensors: ggml ctx size = 0.04 MiB
357
+ llm_load_tensors: CPU buffer size = 8.82 MiB
358
+ ..............
359
+ llama_new_context_with_model: n_ctx = 512
360
+ llama_new_context_with_model: n_batch = 512
361
+ llama_new_context_with_model: n_ubatch = 512
362
+ llama_new_context_with_model: flash_attn = 0
363
+ llama_new_context_with_model: freq_base = 10000.0
364
+ llama_new_context_with_model: freq_scale = 1
365
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
366
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
367
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
368
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
369
+ llama_new_context_with_model: graph nodes = 262
370
+ llama_new_context_with_model: graph splits = 1
371
+
372
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
373
+ sampling:
374
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
375
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
376
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
377
+ sampling order:
378
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
379
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
380
+
381
+
382
+ hello world the gruff man said yes and said hello to her.
383
+ The lady said to her, “I want to keep you a new friend! You are very nice and smart. I will be very proud."
384
+ The man smiled and said, “Yes, I will be more careful!”
385
+ The man looked up, but he knew she had a secret. He said, “I'm the right thing, but I have a special plan!”
386
+ The man was very surprised, but he smiled.
387
+ "That's so smart," he said. "I have to be careful to be kind to others. It's special to help me find things, but it's too late."
388
+ The man smiled and said, "No, you can't do it to be nice to you."
389
+ The man smiled and said, "You're welcome, I'm sorry for helping you. You are very nosy!"
390
+ The man was so happy. He knew that it had helped his friends, and he could be a friend. The people in the town loved to learn together. [end of text]
391
+
392
+
393
+ llama_print_timings: load time = 8.66 ms
394
+ llama_print_timings: sample time = 8.79 ms / 220 runs ( 0.04 ms per token, 25017.06 tokens per second)
395
+ llama_print_timings: prompt eval time = 1.59 ms / 8 tokens ( 0.20 ms per token, 5015.67 tokens per second)
396
+ llama_print_timings: eval time = 352.66 ms / 219 runs ( 1.61 ms per token, 620.99 tokens per second)
397
+ llama_print_timings: total time = 415.12 ms / 227 tokens
398
+ Log end
399
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
400
+ main: llamafile version 0.8.9
401
+ main: seed = 1721531150
402
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
403
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
404
+ llama_model_loader: - kv 0: general.architecture str = llama
405
+ llama_model_loader: - kv 1: general.type str = model
406
+ llama_model_loader: - kv 2: general.name str = TinyLLama
407
+ llama_model_loader: - kv 3: general.author str = Maykeye
408
+ llama_model_loader: - kv 4: general.version str = v0.0
409
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
410
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
411
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
412
+ llama_model_loader: - kv 8: general.license str = apache-2.0
413
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
414
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
415
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
416
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
417
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
418
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
419
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
420
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
421
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
422
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
423
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
424
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
425
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
426
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
427
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
428
+ llama_model_loader: - kv 24: general.file_type u32 = 1
429
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
430
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
431
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
432
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
433
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
434
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
435
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
436
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
437
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
438
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
439
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
440
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
441
+ llama_model_loader: - type f32: 17 tensors
442
+ llama_model_loader: - type f16: 58 tensors
443
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
444
+ llm_load_print_meta: format = GGUF V3 (latest)
445
+ llm_load_print_meta: arch = llama
446
+ llm_load_print_meta: vocab type = SPM
447
+ llm_load_print_meta: n_vocab = 32000
448
+ llm_load_print_meta: n_merges = 0
449
+ llm_load_print_meta: n_ctx_train = 2048
450
+ llm_load_print_meta: n_embd = 64
451
+ llm_load_print_meta: n_head = 16
452
+ llm_load_print_meta: n_head_kv = 16
453
+ llm_load_print_meta: n_layer = 8
454
+ llm_load_print_meta: n_rot = 4
455
+ llm_load_print_meta: n_swa = 0
456
+ llm_load_print_meta: n_embd_head_k = 4
457
+ llm_load_print_meta: n_embd_head_v = 4
458
+ llm_load_print_meta: n_gqa = 1
459
+ llm_load_print_meta: n_embd_k_gqa = 64
460
+ llm_load_print_meta: n_embd_v_gqa = 64
461
+ llm_load_print_meta: f_norm_eps = 0.0e+00
462
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
463
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
464
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
465
+ llm_load_print_meta: f_logit_scale = 0.0e+00
466
+ llm_load_print_meta: n_ff = 256
467
+ llm_load_print_meta: n_expert = 0
468
+ llm_load_print_meta: n_expert_used = 0
469
+ llm_load_print_meta: causal attn = 1
470
+ llm_load_print_meta: pooling type = 0
471
+ llm_load_print_meta: rope type = 0
472
+ llm_load_print_meta: rope scaling = linear
473
+ llm_load_print_meta: freq_base_train = 10000.0
474
+ llm_load_print_meta: freq_scale_train = 1
475
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
476
+ llm_load_print_meta: rope_finetuned = unknown
477
+ llm_load_print_meta: ssm_d_conv = 0
478
+ llm_load_print_meta: ssm_d_inner = 0
479
+ llm_load_print_meta: ssm_d_state = 0
480
+ llm_load_print_meta: ssm_dt_rank = 0
481
+ llm_load_print_meta: model type = ?B
482
+ llm_load_print_meta: model ftype = F16
483
+ llm_load_print_meta: model params = 4.62 M
484
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
485
+ llm_load_print_meta: general.name = TinyLLama
486
+ llm_load_print_meta: BOS token = 1 '<s>'
487
+ llm_load_print_meta: EOS token = 2 '</s>'
488
+ llm_load_print_meta: UNK token = 0 '<unk>'
489
+ llm_load_print_meta: PAD token = 0 '<unk>'
490
+ llm_load_print_meta: LF token = 13 '<0x0A>'
491
+ llm_load_tensors: ggml ctx size = 0.04 MiB
492
+ llm_load_tensors: CPU buffer size = 8.82 MiB
493
+ ..............
494
+ llama_new_context_with_model: n_ctx = 512
495
+ llama_new_context_with_model: n_batch = 512
496
+ llama_new_context_with_model: n_ubatch = 512
497
+ llama_new_context_with_model: flash_attn = 0
498
+ llama_new_context_with_model: freq_base = 10000.0
499
+ llama_new_context_with_model: freq_scale = 1
500
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
501
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
502
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
503
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
504
+ llama_new_context_with_model: graph nodes = 262
505
+ llama_new_context_with_model: graph splits = 1
506
+
507
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
508
+ sampling:
509
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
510
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
511
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
512
+ sampling order:
513
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
514
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
515
+
516
+
517
+ hello world the gruff man said, “Hi, do you want to do something?". He looked up and said, “No!"
518
+ The man looked around and saw the big, tall hill. He wanted to see where he could see it. He jumped in and was so excited!
519
+ The lady said, “It's so messy! This is important to be careful if you can't make a big smile. Thank you for being very nice here."
520
+ The man thought for a moment and then said, “I'm sorry, I would want to eat it now!"
521
+ The man nodded. He knew he could help the girl. He said, “No, we should not try it."
522
+ The boy was sad and said, “No, I don’t need some food. I'll be my friend."
523
+ The man thought for a moment and then said, “You can't have to leave the garden when you're going to the park! We must always share it with your friends".
524
+ The boy smiled and said, “Of course".
525
+ The man smiled and said, “I'm glad you found this party! I'm very excited to play with you.”
526
+ The boy smiled. He was very proud of himself. He said, “Do you want to have some fun? I like to watch this game. I don’t want to ask me if you can be good."
527
+ The boy smiled and said, "Yes, I will, let's go, let's go!"
528
+ So, the boy and the boy hopped and started to walk and see the stars. They laughed and laughed, but he was very proud. [end of text]
529
+
530
+
531
+ llama_print_timings: load time = 17.26 ms
532
+ llama_print_timings: sample time = 13.20 ms / 347 runs ( 0.04 ms per token, 26287.88 tokens per second)
533
+ llama_print_timings: prompt eval time = 2.46 ms / 8 tokens ( 0.31 ms per token, 3257.33 tokens per second)
534
+ llama_print_timings: eval time = 550.67 ms / 346 runs ( 1.59 ms per token, 628.33 tokens per second)
535
+ llama_print_timings: total time = 649.30 ms / 354 tokens
536
+ Log end
537
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
538
+ main: llamafile version 0.8.9
539
+ main: seed = 1721531259
540
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
541
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
542
+ llama_model_loader: - kv 0: general.architecture str = llama
543
+ llama_model_loader: - kv 1: general.type str = model
544
+ llama_model_loader: - kv 2: general.name str = TinyLLama
545
+ llama_model_loader: - kv 3: general.author str = Maykeye
546
+ llama_model_loader: - kv 4: general.version str = v0.0
547
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
548
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
549
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
550
+ llama_model_loader: - kv 8: general.license str = apache-2.0
551
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
552
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
553
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
554
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
555
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
556
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
557
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
558
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
559
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
560
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
561
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
562
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
563
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
564
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
565
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
566
+ llama_model_loader: - kv 24: general.file_type u32 = 1
567
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
568
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
569
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
570
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
571
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
572
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
573
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
574
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
575
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
576
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
577
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
578
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
579
+ llama_model_loader: - type f32: 17 tensors
580
+ llama_model_loader: - type f16: 58 tensors
581
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
582
+ llm_load_print_meta: format = GGUF V3 (latest)
583
+ llm_load_print_meta: arch = llama
584
+ llm_load_print_meta: vocab type = SPM
585
+ llm_load_print_meta: n_vocab = 32000
586
+ llm_load_print_meta: n_merges = 0
587
+ llm_load_print_meta: n_ctx_train = 2048
588
+ llm_load_print_meta: n_embd = 64
589
+ llm_load_print_meta: n_head = 16
590
+ llm_load_print_meta: n_head_kv = 16
591
+ llm_load_print_meta: n_layer = 8
592
+ llm_load_print_meta: n_rot = 4
593
+ llm_load_print_meta: n_swa = 0
594
+ llm_load_print_meta: n_embd_head_k = 4
595
+ llm_load_print_meta: n_embd_head_v = 4
596
+ llm_load_print_meta: n_gqa = 1
597
+ llm_load_print_meta: n_embd_k_gqa = 64
598
+ llm_load_print_meta: n_embd_v_gqa = 64
599
+ llm_load_print_meta: f_norm_eps = 0.0e+00
600
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
601
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
602
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
603
+ llm_load_print_meta: f_logit_scale = 0.0e+00
604
+ llm_load_print_meta: n_ff = 256
605
+ llm_load_print_meta: n_expert = 0
606
+ llm_load_print_meta: n_expert_used = 0
607
+ llm_load_print_meta: causal attn = 1
608
+ llm_load_print_meta: pooling type = 0
609
+ llm_load_print_meta: rope type = 0
610
+ llm_load_print_meta: rope scaling = linear
611
+ llm_load_print_meta: freq_base_train = 10000.0
612
+ llm_load_print_meta: freq_scale_train = 1
613
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
614
+ llm_load_print_meta: rope_finetuned = unknown
615
+ llm_load_print_meta: ssm_d_conv = 0
616
+ llm_load_print_meta: ssm_d_inner = 0
617
+ llm_load_print_meta: ssm_d_state = 0
618
+ llm_load_print_meta: ssm_dt_rank = 0
619
+ llm_load_print_meta: model type = ?B
620
+ llm_load_print_meta: model ftype = F16
621
+ llm_load_print_meta: model params = 4.62 M
622
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
623
+ llm_load_print_meta: general.name = TinyLLama
624
+ llm_load_print_meta: BOS token = 1 '<s>'
625
+ llm_load_print_meta: EOS token = 2 '</s>'
626
+ llm_load_print_meta: UNK token = 0 '<unk>'
627
+ llm_load_print_meta: PAD token = 0 '<unk>'
628
+ llm_load_print_meta: LF token = 13 '<0x0A>'
629
+ llm_load_tensors: ggml ctx size = 0.04 MiB
630
+ llm_load_tensors: CPU buffer size = 8.82 MiB
631
+ ..............
632
+ llama_new_context_with_model: n_ctx = 512
633
+ llama_new_context_with_model: n_batch = 512
634
+ llama_new_context_with_model: n_ubatch = 512
635
+ llama_new_context_with_model: flash_attn = 0
636
+ llama_new_context_with_model: freq_base = 10000.0
637
+ llama_new_context_with_model: freq_scale = 1
638
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
639
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
640
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
641
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
642
+ llama_new_context_with_model: graph nodes = 262
643
+ llama_new_context_with_model: graph splits = 1
644
+
645
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
646
+ sampling:
647
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
648
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
649
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
650
+ sampling order:
651
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
652
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
653
+
654
+
655
+ hello world the gruff man said he would help her. As he ran, he noticed something shiny in the sky. He looked around and saw a small, old lady, who was so excited! She said, “Can I try this?"
656
+ The old lady smiled and said, “Yes, but I have to keep the egg. It is so nice!”
657
+ The old man smiled. He said, “Yes, that is a good idea! I will stay in your house and give you a hug!"
658
+ The old man smiled, but then he said, “We can be very careful when you take it away". He said: “I want to be brave," The old man was so proud of his work. He said, “I need to be happy with this. Let's play together!"
659
+ The old man said, “No, I don’t want to go. We need to be careful."
660
+ The old man said, “Don't worry, I will be happy."
661
+ The old man smiled and said, "It's okay. We can try to take some more times. But be careful. Maybe you can't stop your friends."
662
+ The old man smiled and said, “Yes, you can. I'm here to help you. It's time for this problem." The old man nodded and said, “I will find it! We can are very careful with it".
663
+ The old man agreed. He gave the ugly man a big hug and said, “I know you would like it, but you don't need it."
664
+ The old man smiled and said, “You do so. I like the old man. I can be nice. He is my friend and I will help you get your way back to the party."
665
+ The old man smiled and said, “You don’s okay. You're so brave to find it, and I'm glad you have a new friend. That's a very nice idea and I'll take your things. [end of text]
666
+
667
+
668
+ llama_print_timings: load time = 7.35 ms
669
+ llama_print_timings: sample time = 13.73 ms / 409 runs ( 0.03 ms per token, 29780.11 tokens per second)
670
+ llama_print_timings: prompt eval time = 1.57 ms / 8 tokens ( 0.20 ms per token, 5102.04 tokens per second)
671
+ llama_print_timings: eval time = 972.37 ms / 408 runs ( 2.38 ms per token, 419.59 tokens per second)
672
+ llama_print_timings: total time = 1089.29 ms / 416 tokens
673
+ Log end
674
+ note: if you have an AMD or NVIDIA GPU then you need to pass -ngl 9999 to enable GPU offloading
675
+ main: llamafile version 0.8.9
676
+ main: seed = 1721532670
677
+ llama_model_loader: loaded meta data with 37 key-value pairs and 75 tensors from TinyLLama-4.6M-v0.0-F16.gguf (version GGUF V3 (latest))
678
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
679
+ llama_model_loader: - kv 0: general.architecture str = llama
680
+ llama_model_loader: - kv 1: general.type str = model
681
+ llama_model_loader: - kv 2: general.name str = TinyLLama
682
+ llama_model_loader: - kv 3: general.author str = Maykeye
683
+ llama_model_loader: - kv 4: general.version str = v0.0
684
+ llama_model_loader: - kv 5: general.description str = This gguf is ported from a first vers...
685
+ llama_model_loader: - kv 6: general.quantized_by str = Mofosyne
686
+ llama_model_loader: - kv 7: general.size_label str = 4.6M
687
+ llama_model_loader: - kv 8: general.license str = apache-2.0
688
+ llama_model_loader: - kv 9: general.license.name str = Apache License Version 2.0, January 2004
689
+ llama_model_loader: - kv 10: general.license.link str = https://huggingface.co/datasets/choos...
690
+ llama_model_loader: - kv 11: general.url str = https://huggingface.co/mofosyne/TinyL...
691
+ llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/mofosyne/TinyL...
692
+ llama_model_loader: - kv 13: general.source.url str = https://huggingface.co/Maykeye/TinyLL...
693
+ llama_model_loader: - kv 14: general.source.repo_url str = https://huggingface.co/Maykeye/TinyLL...
694
+ llama_model_loader: - kv 15: general.tags arr[str,5] = ["text generation", "transformer", "l...
695
+ llama_model_loader: - kv 16: general.languages arr[str,1] = ["en"]
696
+ llama_model_loader: - kv 17: general.datasets arr[str,2] = ["https://huggingface.co/datasets/ron...
697
+ llama_model_loader: - kv 18: llama.block_count u32 = 8
698
+ llama_model_loader: - kv 19: llama.context_length u32 = 2048
699
+ llama_model_loader: - kv 20: llama.embedding_length u32 = 64
700
+ llama_model_loader: - kv 21: llama.feed_forward_length u32 = 256
701
+ llama_model_loader: - kv 22: llama.attention.head_count u32 = 16
702
+ llama_model_loader: - kv 23: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
703
+ llama_model_loader: - kv 24: general.file_type u32 = 1
704
+ llama_model_loader: - kv 25: llama.vocab_size u32 = 32000
705
+ llama_model_loader: - kv 26: llama.rope.dimension_count u32 = 4
706
+ llama_model_loader: - kv 27: tokenizer.ggml.model str = llama
707
+ llama_model_loader: - kv 28: tokenizer.ggml.pre str = default
708
+ llama_model_loader: - kv 29: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
709
+ llama_model_loader: - kv 30: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
710
+ llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
711
+ llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 1
712
+ llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 2
713
+ llama_model_loader: - kv 34: tokenizer.ggml.unknown_token_id u32 = 0
714
+ llama_model_loader: - kv 35: tokenizer.ggml.padding_token_id u32 = 0
715
+ llama_model_loader: - kv 36: general.quantization_version u32 = 2
716
+ llama_model_loader: - type f32: 17 tensors
717
+ llama_model_loader: - type f16: 58 tensors
718
+ llm_load_vocab: special tokens definition check successful ( 259/32000 ).
719
+ llm_load_print_meta: format = GGUF V3 (latest)
720
+ llm_load_print_meta: arch = llama
721
+ llm_load_print_meta: vocab type = SPM
722
+ llm_load_print_meta: n_vocab = 32000
723
+ llm_load_print_meta: n_merges = 0
724
+ llm_load_print_meta: n_ctx_train = 2048
725
+ llm_load_print_meta: n_embd = 64
726
+ llm_load_print_meta: n_head = 16
727
+ llm_load_print_meta: n_head_kv = 16
728
+ llm_load_print_meta: n_layer = 8
729
+ llm_load_print_meta: n_rot = 4
730
+ llm_load_print_meta: n_swa = 0
731
+ llm_load_print_meta: n_embd_head_k = 4
732
+ llm_load_print_meta: n_embd_head_v = 4
733
+ llm_load_print_meta: n_gqa = 1
734
+ llm_load_print_meta: n_embd_k_gqa = 64
735
+ llm_load_print_meta: n_embd_v_gqa = 64
736
+ llm_load_print_meta: f_norm_eps = 0.0e+00
737
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
738
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
739
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
740
+ llm_load_print_meta: f_logit_scale = 0.0e+00
741
+ llm_load_print_meta: n_ff = 256
742
+ llm_load_print_meta: n_expert = 0
743
+ llm_load_print_meta: n_expert_used = 0
744
+ llm_load_print_meta: causal attn = 1
745
+ llm_load_print_meta: pooling type = 0
746
+ llm_load_print_meta: rope type = 0
747
+ llm_load_print_meta: rope scaling = linear
748
+ llm_load_print_meta: freq_base_train = 10000.0
749
+ llm_load_print_meta: freq_scale_train = 1
750
+ llm_load_print_meta: n_yarn_orig_ctx = 2048
751
+ llm_load_print_meta: rope_finetuned = unknown
752
+ llm_load_print_meta: ssm_d_conv = 0
753
+ llm_load_print_meta: ssm_d_inner = 0
754
+ llm_load_print_meta: ssm_d_state = 0
755
+ llm_load_print_meta: ssm_dt_rank = 0
756
+ llm_load_print_meta: model type = ?B
757
+ llm_load_print_meta: model ftype = F16
758
+ llm_load_print_meta: model params = 4.62 M
759
+ llm_load_print_meta: model size = 8.82 MiB (16.00 BPW)
760
+ llm_load_print_meta: general.name = TinyLLama
761
+ llm_load_print_meta: BOS token = 1 '<s>'
762
+ llm_load_print_meta: EOS token = 2 '</s>'
763
+ llm_load_print_meta: UNK token = 0 '<unk>'
764
+ llm_load_print_meta: PAD token = 0 '<unk>'
765
+ llm_load_print_meta: LF token = 13 '<0x0A>'
766
+ llm_load_tensors: ggml ctx size = 0.04 MiB
767
+ llm_load_tensors: CPU buffer size = 8.82 MiB
768
+ ..............
769
+ llama_new_context_with_model: n_ctx = 512
770
+ llama_new_context_with_model: n_batch = 512
771
+ llama_new_context_with_model: n_ubatch = 512
772
+ llama_new_context_with_model: flash_attn = 0
773
+ llama_new_context_with_model: freq_base = 10000.0
774
+ llama_new_context_with_model: freq_scale = 1
775
+ llama_kv_cache_init: CPU KV buffer size = 1.00 MiB
776
+ llama_new_context_with_model: KV self size = 1.00 MiB, K (f16): 0.50 MiB, V (f16): 0.50 MiB
777
+ llama_new_context_with_model: CPU output buffer size = 0.12 MiB
778
+ llama_new_context_with_model: CPU compute buffer size = 62.75 MiB
779
+ llama_new_context_with_model: graph nodes = 262
780
+ llama_new_context_with_model: graph splits = 1
781
+
782
+ system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
783
+ sampling:
784
+ repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
785
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
786
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
787
+ sampling order:
788
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
789
+ generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1
790
+
791
+
792
+ hello world the gruff man said yes. The man said he could go. The man said the man was very brave and he wanted to go. The man was very curious and he decided to go on a trip.
793
+ After the story, the man was ready to go. He waved goodbye and said, "Don't worry, I will go on a walk." The man stepped down and said, "Don't be scared, I'xy. The man was a brave boy!"
794
+ The man smiled and said, "I'm sorry, I am so hungry." The man smiled and said, "That's nice. I'm proud of you."
795
+ The man said, "You can go with me. The man can fly fast for you, and I'll come with you." The man said, "Yes, let's do it!"
796
+ The man and the man ran around the village, and the man was happy. He was very kind. He was happy to have a friend. The man said, "You are very nice!"
797
+ The man hugged the man and said, "We are welcome, little boy. We were so proud of you. We are best friends."
798
+ The man smiled and said, "You are right, little boy, it's not a good thing to do. I have the way to the party to have fun." [end of text]
799
+
800
+
801
+ llama_print_timings: load time = 6.42 ms
802
+ llama_print_timings: sample time = 8.73 ms / 279 runs ( 0.03 ms per token, 31969.75 tokens per second)
803
+ llama_print_timings: prompt eval time = 1.53 ms / 8 tokens ( 0.19 ms per token, 5239.03 tokens per second)
804
+ llama_print_timings: eval time = 366.91 ms / 278 runs ( 1.32 ms per token, 757.69 tokens per second)
805
+ llama_print_timings: total time = 440.41 ms / 286 tokens
806
+ Log end