Upload extended quantizations
Browse filesIn theory these are the same as f16 model weight and shouldn't add any value to generations
Anyway, I've seen some responses aren't the same, specially since these were quantized with the importance matrix
bf16 is pretty slow if you don't have a professional GPU, GA102 cards are compatible, but slow :(
- .gitattributes +2 -0
- phi-3-mini-4k-instruct-bf16.gguf +3 -0
- phi-3-mini-4k-instruct-f32.gguf +3 -0
.gitattributes
CHANGED
@@ -49,3 +49,5 @@ phi-3-mini-4k-instruct-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
|
|
49 |
phi-3-mini-4k-instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
|
50 |
phi-3-mini-4k-instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
|
51 |
phi-3-mini-4k.imatrix filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
49 |
phi-3-mini-4k-instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
|
50 |
phi-3-mini-4k-instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
|
51 |
phi-3-mini-4k.imatrix filter=lfs diff=lfs merge=lfs -text
|
52 |
+
phi-3-mini-4k-instruct-bf16.gguf filter=lfs diff=lfs merge=lfs -text
|
53 |
+
phi-3-mini-4k-instruct-f32.gguf filter=lfs diff=lfs merge=lfs -text
|
phi-3-mini-4k-instruct-bf16.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:95b114be3ba31ee40f6e034771b2f6e9818cb5b7465fc6592ee4e6e6dec8e066
|
3 |
+
size 7643296992
|
phi-3-mini-4k-instruct-f32.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3f1874584b449c0adf023b2b7bb02bd825baa5f2b7e4fbdc08c0f56a199fa761
|
3 |
+
size 15088055520
|