language: el
el-llama-smol
Model:
el-llama-smol
aims to be the first in a series of LLMs trained mostly in Greek corpora. The model is a small (1bn parameters) version of LLama, with the following configuration.
{
"architectures": ["LLaMAForCausalLM"],
"bos_token_id": 0,
"eos_token_id": 1,
"hidden_act": "silu",
"hidden_size": 2048,
"intermediate_size": 5461,
"initializer_range": 0.02,
"max_sequence_length": 1024,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 24,
"pad_token_id": -1,
"rms_norm_eps": 1e-06,
"transformers_version": "4.28.1",
"use_cache": true,
"vocab_size": 22000
}
Training details:
The current snapshot has been trained for 40hrs with an RTX A6000 GPU (48G), using the galore_adamw8bit_per_layer
optimizer by Zhao et. al [1] and a context size of 1024 tokens.
Dataset:
The model is trained on the Greek subset of the allenai/c4 dataset. Text tokenization is performed with a (heavily unoptimized) tokenizer with vocab size of 22000 tokens, trained with SentencePiece
Examples
Use a 🤗 pipeline
from transformers import pipeline
pipe = pipeline("text-generation", model="Konstantinos/el_llama_smol")
set_seed(1)
prompt = """Η Ιαπωνία έχει μια ιστορία που ξεκινά πριν από χιλιάδες χρόνια.
Οι επιστήμονες πιστεύουν πως οι Ιάπωνες ως ενιαίο σύνολο προέρχονται από πολλές ομάδες,
οι οποίες μετανάστευσαν στα νησιά από άλλα σημεία της Ασίας, στα οποία περιλαμβάνονται """
ret = pipe(prompt, do_sample=True, top_k=20, temperature=0.85, max_new_tokens=110)
Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Konstantinos/el_llama_smol")
model = AutoModelForCausalLM.from_pretrained("Konstantinos/el_llama_smol")
References
[1] Jiawei Zhao, Zhenyu Zhang, Beidi Chen, Zhangyang Wang, Anima Anandkumar, & Yuandong Tian. (2024). GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
Citation
TBD
license: odc-by
- Downloads last month
- 119
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.