Transformers
Safetensors
Tamil
Inference Endpoints

Model Card for Model ID

Pretrained the Llama3.2-1B model with Tamil text from uonlp/CulturaX.

Model Details

Model Description

The Tamil LLaMA models have been enhanced and tailored specifically with an extensive Tamil vocabulary of 20,000 tokens, building upon the foundation set by the original LLaMA-3.2. This is very similar to abhinand/tamil-llama-7b-base-v0.1.

  • Developed by: Mohan Parthasarathy
  • Funded by [optional]: Self
  • Shared by [optional]: Self
  • Model type: Pretrained model
  • Language(s) (NLP): Tamil
  • License: Apache 2.0

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

model = AutoPeftModelForCausalLM.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")
tokenizer = AutoTokenizer.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")

Training Details

This follows the steps described in https://arxiv.org/pdf/2311.05845.

  1. A new tokenizer is built using Sentencpiece by sampling 1 million documents 4.7 million documents from uonlp/Cultura-X
  2. Model was trained with Tamil data from uonlp/Cultura-X using bfloat16. Original model was loaded in 8 bits using Lora.

Training Data

https://huggingface.co/datasets/uonlp/CulturaX/tree/main/ta

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: bfloat16, 8 bits Lora

Speeds, Sizes, Times [optional]

epoch = 0.9995 total_flos = 516494476GF train_loss = 2.7275 train_runtime = 3:37:51.19 train_samples = 70222 train_samples_per_second = 5.372 train_steps_per_second = 0.084

Evaluation

epoch = 0.9995
eval_accuracy = 0.5318 eval_loss = 2.2674
eval_runtime = 0:05:49.70 eval_samples = 7803
eval_samples_per_second = 22.313 eval_steps_per_second = 2.791
perplexity = 9.6547

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 1x A100 SXM4
  • Hours used: 3:38
  • Cloud Provider: vast.ai
  • Compute Region: US
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for suruti94/llama-3.21B-tamil-base-0.2

Finetuned
(203)
this model

Dataset used to train suruti94/llama-3.21B-tamil-base-0.2