Fishfishfishfishfish
/

Gemma-2-2B_wllama_gguf

Inference Endpoints

Model card Files Files and versions Community

Fishfishfishfishfish commited on Sep 15, 2024

Commit

f3e69d1

·

verified ·

1 Parent(s): 4b076dc

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -4,4 +4,6 @@ language:
 - en
 base_model: google/gemma-2-2b-it
 ---
-Gemma 2 2B quantized for wllama (under 2gb).

 - en
 base_model: google/gemma-2-2b-it
 ---
+Gemma 2 2B quantized for wllama (under 2gb).
+q4_0_4_8 is WAY faster when using llama.cpp, with wllama, it's about the same as q4_k.