dewabrata/cerita_seru_70B
Model Description
This is the original version of the LLaMA 70B model fine-tuned for generating creative stories. The model retains full FP16 precision and is optimized for high-quality text generation tasks.
Key Features
- Base Model: LLaMA 70B
- Precision: FP16 for high accuracy
- Task: Text generation
- Performance: Designed for high-quality text generation requiring substantial GPU memory.
Usage
You can use this model for text generation tasks with the Hugging Face Transformers library or with vLLM for efficient inference.
Example Code with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "dewabrata/cerita_seru_70B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Generate text
prompt = "Ceritakan tentang Rina, seorang wanita berhijab yang bersemangat menjalani hidupnya dan memiliki bakat luar biasa dalam seni lukis."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=500, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Code with vLLM
from vllm import LLM, SamplingParams
# Load model with vLLM
model_name = "dewabrata/cerita_seru_70B"
llm = LLM(model_name)
# Generate text
prompt = "Ceritakan tentang Rina, seorang wanita berhijab yang bersemangat menjalani hidupnya dan memiliki bakat luar biasa dalam seni lukis."
sampling_params = SamplingParams(
temperature=0.7,
top_p=0.9,
max_tokens=500
)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].text)
Performance
The full-precision model delivers the highest accuracy and text quality but requires significant computational resources for inference.
Resource Requirements
- Memory: ~80GB VRAM for full FP16 LLaMA 70B
- Inference Speed: Slower than quantized versions due to higher computational complexity.
Limitations
- Hardware Requirements: This model requires GPUs with at least 80GB VRAM or distributed multi-GPU setups.
- Latency: Higher latency compared to quantized models due to full-precision computations.
Training Details
- Base Model: LLaMA 70B
- Fine-tuning Dataset: Custom dataset for storytelling tasks.
- Precision: FP16 for maximum performance.
How to Deploy
You can deploy this model on Hugging Face Spaces or use it locally for inference. For best performance, use GPUs like NVIDIA A100 or similar with sufficient VRAM.
Citation
If you use this model, please cite:
@misc{dewabrata2024,
author = {Dewabrata},
title = {Cerita Panas Generator - LLaMA 70B},
year = {2024},
howpublished = {\url{https://huggingface.co/dewabrata/cerita_seru_70B}},
}
License
The model inherits the license of the base LLaMA 70B model. Please ensure compliance with its terms before using this model.
- Downloads last month
- 193