nickmalhotra
commited on
Create SFT/prepare_sft_dataset.py
Browse files
SFT/prepare_sft_dataset.py
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
'''
|
2 |
+
File Name: prepare_sft_dataset.py Author: Nikhil Malhotra
|
3 |
+
Date: 21/7/2024
|
4 |
+
purpose: The purpose of a file is to create high quality SFT dataset for Project Indus.
|
5 |
+
Dataset source is obtained from Hugging face and enables to get high quality SFT dataset
|
6 |
+
Dataset is then translated in requisite dialects as supported by Google
|
7 |
+
Dataset is also split in train and test and enables to create requisite files
|
8 |
+
Name of the file carries the source, translated into a dialect along with split type
|
9 |
+
'''
|