Quantizations.

#2
by kuliev-vitaly - opened

Model is to large for starting on even 2 A100. Quantizations should help with hardware requirements. Could you please make awq(4bit) or fp8 quantizations?

Sign up or log in to comment