Model Card for Model ID

Pretrained the Llama3.2-1B model with Tamil text from uonlp/CulturaX.

Model Details

Model Description

The Tamil LLaMA models have been enhanced and tailored specifically with an extensive Tamil vocabulary of 20,000 tokens, building upon the foundation set by the original LLaMA-3.2. This is very similar to abhinand/tamil-llama-7b-base-v0.1.

Developed by: Mohan Parthasarathy
Funded by [optional]: Self
Shared by [optional]: Self
Model type: Pretrained model
Language(s) (NLP): Tamil
License: Apache 2.0

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

model = AutoPeftModelForCausalLM.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")
tokenizer = AutoTokenizer.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")

Training Details

This follows the steps described in https://arxiv.org/pdf/2311.05845.

A new tokenizer is built using Sentencpiece by sampling 1 million documents 4.7 million documents from uonlp/Cultura-X
Model was trained with Tamil data from uonlp/Cultura-X using bfloat16. Original model was loaded in 8 bits using Lora.

Training Data

https://huggingface.co/datasets/uonlp/CulturaX/tree/main/ta

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: bfloat16, 8 bits Lora

Speeds, Sizes, Times [optional]

epoch = 0.9995 total_flos = 516494476GF train_loss = 2.7275 train_runtime = 3:37:51.19 train_samples = 70222 train_samples_per_second = 5.372 train_steps_per_second = 0.084

Evaluation

epoch = 0.9995
eval_accuracy = 0.5318 eval_loss = 2.2674
eval_runtime = 0:05:49.70 eval_samples = 7803
eval_samples_per_second = 22.313 eval_steps_per_second = 2.791
perplexity = 9.6547

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 1x A100 SXM4
Hours used: 3:38
Cloud Provider: vast.ai
Compute Region: US
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[email protected]

suruti94
/

llama-3.21B-tamil-base-0.2

Model Card for Model ID

Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for suruti94/llama-3.21B-tamil-base-0.2

Dataset used to train suruti94/llama-3.21B-tamil-base-0.2