T-lite-it-1.0_Q4_0
T-lite-it-1.0_Q4_0 is a quantized version of the T-lite-it-1.0 model, originally based on the Qwen 2.5 7B architecture and fine-tuned for Russian-language tasks. This version is optimized for memory-constrained environments, making it suitable for fine-tuning and inference on GPUs with as little as 8GB VRAM. The quantization was performed using BitsAndBytes, reducing the model to 4-bit precision.
Model Description
- Language: Russian
- Base Model: T-Lite-IT-1.0 (derived from Qwen 2.5 7B)
- Quantization: 4-bit precision using
BitsAndBytes
- Tasks: Text generation, conversation, question answering, and chain-of-thought reasoning
- Fine-Tuning Ready: Ideal for further fine-tuning in low-resource environments.
- VRAM Requirements: Fine-tuning and inference possible with 8GB VRAM or more.
Usage
To load the model, ensure you have the required dependencies installed:
pip install transformers bitsandbytes
Then, load the model with the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MilyaShams/T-lite-it-1.0_Q4_0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
device_map="auto"
)
Fine-Tuning
The model is designed for fine-tuning with resource constraints. Use tools like Hugging Face's Trainer
or peft
(Parameter-Efficient Fine-Tuning) to adapt the model to specific tasks.
Example configuration for fine-tuning:
- Batch Size: Adjust to fit within 8GB VRAM (e.g., batch_size=2).
- Gradient Accumulation: Use to simulate larger batch sizes.
Model Card Authors
- Downloads last month
- 323
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for MilyaShams/T-lite-it-1.0_Q4_0
Base model
t-tech/T-lite-it-1.0