MounikaAithagoni commited on
Commit
5ad2472
·
verified ·
1 Parent(s): 23619ce

Update README.md

Browse files

Arabic and English Translator with Next Token Prediction
This project implements a neural network-based language model for next token prediction in both English and Arabic. It explores natural language processing tasks using RNNs or LSTMs for text generation.

Project Overview
Languages: English and Arabic
Model Architecture: Recurrent Neural Networks (RNNs) or Long Short-Term Memory networks (LSTMs)
Features:
Next token prediction
Text generation
Checkpoint saving
Perplexity score tracking
Dataset: Based on the multilingual Alpaca dataset.
Dataset
The dataset used for this project is derived from the [Multilingual Alpaca Datase]
https: //huggingface.co/datasets/saillab/taco-datasets/tree/main/multilingual-instruction-tuning-dataset/multilingual-alpaca-52k-gpt-4Links, which contains multilingual instruction-tuning examples generated using GPT-4.
It includes high-quality, diverse text samples in multiple languages, including English and Arabic, making it ideal for next-token prediction tasks.

Files
Q8_Evaluation_Arabic_and_English_Translator.ipynb: Jupyter notebook containing the implementation.
README.md: Project description and usage details.
models/: Pre-trained models and checkpoints (if included).
data/: Training and evaluation datasets.
Installation
Clone this repository:
git clone <repository-url>
cd <repository-folder>
Install dependencies:
bash
Copy code
pip install -r requirements.txt
How to Use
Open the Jupyter notebook:

bash
Copy code
jupyter notebook Q8_Evaluation_Arabic_and_English_Translator.ipynb

Follow the instructions in the notebook to:

Load the dataset
Train the model
Evaluate performance
Generate text in English and Arabic


Evaluation Metrics

Perplexity: Used to evaluate the model’s performance in predicting the next token.
Text Examples: Generated samples provided for both languages.
Checkpoints
The model checkpoints are saved during training to allow resuming or re-evaluation:

Example: checkpoints/model_epoch_{epoch}.pth


Results
Generated Text: Examples of text outputs in both languages.
Perplexity Scores: Performance evaluation over training epochs.
Hugging Face Integration
The trained model is hosted on Hugging Face. You can download and test it using the link:

Hugging Face Model Repository

Contributing
Feel free to contribute by submitting pull requests or reporting issues.

License
This project is licensed under the MIT License. See the LICENSE file for details.

Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -9,4 +9,5 @@ base_model:
9
  - microsoft/Phi-3.5-mini-instruct
10
  datasets:
11
  - sieu-n/alpaca_eval_multilingual
 
12
  ---
 
9
  - microsoft/Phi-3.5-mini-instruct
10
  datasets:
11
  - sieu-n/alpaca_eval_multilingual
12
+ pipeline_tag: translation
13
  ---