nickmalhotra commited on
Commit
19871d5
·
verified ·
1 Parent(s): a510b5a

Create SFT/prepare_sft_dataset.py

Browse files
Files changed (1) hide show
  1. SFT/prepare_sft_dataset.py +9 -0
SFT/prepare_sft_dataset.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ '''
2
+ File Name: prepare_sft_dataset.py Author: Nikhil Malhotra
3
+ Date: 21/7/2024
4
+ purpose: The purpose of a file is to create high quality SFT dataset for Project Indus.
5
+ Dataset source is obtained from Hugging face and enables to get high quality SFT dataset
6
+ Dataset is then translated in requisite dialects as supported by Google
7
+ Dataset is also split in train and test and enables to create requisite files
8
+ Name of the file carries the source, translated into a dialect along with split type
9
+ '''