Happy New Year, Huggingface community! In 2025, I'll continue my quantization (and some fine-tuning) efforts to support the open-source AI and Make knowledge free for everyone.
The deepseek-ai/DeepSeek-V3-Base model has featured today on CNBC tech news. The whale made a splash by using FP8 and shrink the cost of training significantly!
I've built a small utility to split safetensors file by file. The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16. The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.
BTW: I'll upload the bf16 version here: DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16 (it will take a while - days with my upload speed) If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.
The tool, is available from here: https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file. I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline. use --help for usage Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.
I have this small utility: no_more_typo It is running in the background and able to call the LLM model to update the text on the clipboard. I think it would be ideal to fix typos and syntax. I have just added the option to use custom prompt templates to perform different tasks.
Repurposed my older AI workstation to a homelab server, it has received 2xV100 + 1xP40 I can reach huge 210k token context size with MegaBeam-Mistral-7B-512k-GGUF ~70+tok/s, or run Llama-3.1-Nemotron-70B-Instruct-HF-GGUF with 50k Context ~10tok/s (V100 only 40k ctx and 15tok/s). Also able to Lora finetune with similar performace as an RTX3090. It moved to the garage to no complaints for the noise from the family. Will move to a Rack soon :D