Upload extended quantizations

In theory these are the same as f16 model weight and shouldn't add any value to generations

Anyway, I've seen some responses aren't the same, specially since these were quantized with the importance matrix

bf16 is pretty slow if you don't have a professional GPU, GA102 cards are compatible, but slow :(

Files changed (3) hide show

.gitattributes +2 -0
phi-3-mini-4k-instruct-bf16.gguf +3 -0
phi-3-mini-4k-instruct-f32.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -49,3 +49,5 @@ phi-3-mini-4k-instruct-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
 phi-3-mini-4k-instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
 phi-3-mini-4k-instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
 phi-3-mini-4k.imatrix filter=lfs diff=lfs merge=lfs -text

 phi-3-mini-4k-instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
 phi-3-mini-4k-instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
 phi-3-mini-4k.imatrix filter=lfs diff=lfs merge=lfs -text
+phi-3-mini-4k-instruct-bf16.gguf filter=lfs diff=lfs merge=lfs -text
+phi-3-mini-4k-instruct-f32.gguf filter=lfs diff=lfs merge=lfs -text

phi-3-mini-4k-instruct-bf16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95b114be3ba31ee40f6e034771b2f6e9818cb5b7465fc6592ee4e6e6dec8e066
+size 7643296992

phi-3-mini-4k-instruct-f32.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f1874584b449c0adf023b2b7bb02bd825baa5f2b7e4fbdc08c0f56a199fa761
+size 15088055520