How to use it?

#1
by austinit - opened

It doesn't seem to be compatible with the official cases given.
how to use it? should i use transformer libary ?

It's for use with exllamav2, but I haven't gotten around to making it work yet. You might be able to load the model with transformers if you edit the inference script, but I'm not sure if it works with exl2, or just gptq/awq.

I used the quantized version of GGUF, but it can only generate some intermediates and I dont know how to convert them.

image.png

Then I edited the inference script and managed to generate the first stage of music using the 8-bit quantized version of the model(GUFF).
But it seems to be very slow. I'm using an A800 GPU. less than 20gb of GPU memory was used.

Less than 50% of the arithmetic power was consumed. Generating 2 minutes of music took 10 minutes in the first phase alone.

Sign up or log in to comment