DevQuasar

community
Verified
Activity Feed

AI & ML interests

Open-Source LLMs, Local AI Projects: https://pypi.org/project/llm-predictive-router/

Recent Activity

DevQuasar's activity

csabakecskemetiย 
posted an update 2 days ago
csabakecskemetiย 
posted an update 3 days ago
csabakecskemetiย 
posted an update 5 days ago
csabakecskemetiย 
posted an update 7 days ago
csabakecskemetiย 
posted an update 8 days ago
view post
Post
1444
I've built a small utility to split safetensors file by file.
The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16.
The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.

BTW: I'll upload the bf16 version here:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
(it will take a while - days with my upload speed)
If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.

The tool, is available from here:
https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py
It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file.
I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline.
use --help for usage
Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.
csabakecskemetiย 
posted an update 19 days ago
csabakecskemetiย 
posted an update 22 days ago
csabakecskemetiย 
posted an update about 1 month ago
csabakecskemetiย 
posted an update about 1 month ago
view post
Post
1180
I have this small utility: no_more_typo
It is running in the background and able to call the LLM model to update the text on the clipboard. I think it would be ideal to fix typos and syntax.
I have just added the option to use custom prompt templates to perform different tasks.

Details, code and executable:
https://github.com/csabakecskemeti/no_more_typo

https://devquasar.com/no-more-typo/
csabakecskemetiย 
posted an update about 1 month ago
view post
Post
292
Repurposed my older AI workstation to a homelab server, it has received 2xV100 + 1xP40
I can reach huge 210k token context size with MegaBeam-Mistral-7B-512k-GGUF ~70+tok/s, or run Llama-3.1-Nemotron-70B-Instruct-HF-GGUF with 50k Context ~10tok/s (V100 only 40k ctx and 15tok/s).
Also able to Lora finetune with similar performace as an RTX3090.
It moved to the garage to no complaints for the noise from the family. Will move to a Rack soon :D
  • 2 replies
ยท