slow
#4
by
ehartford
- opened
I am trying to use this on m3 max.
It's so slow that it's unusable. (0.51 token/second)
Is it possible to make it any faster?
Can't compare it with llama.cpp because, it doesn't work there yet.
Yes, it's possible to make it faster. You can read more here: