IsmatS commited on
Commit
34f14ec
Β·
verified Β·
1 Parent(s): ddccc5b

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -12,13 +12,14 @@ This repository contains an implementation of a GPT (Generative Pre-trained Tran
12
  β”œβ”€β”€ collect_data.py # Script for collecting Wikipedia articles
13
  β”œβ”€β”€ generate.py # Text generation script using the trained model
14
  β”œβ”€β”€ prepare_data.py # Data preprocessing and tokenizer training
 
15
  β”œβ”€β”€ requirements.txt # Project dependencies
16
  └── train.py # GPT model training script
17
  ```
18
 
19
  ## Setup
20
 
21
- 1. Create and activate virtual environment:
22
  ```bash
23
  python -m venv .venv
24
  source .venv/bin/activate # On Windows: .venv\Scripts\activate
@@ -32,7 +33,7 @@ For Mac with Apple Silicon (M1/M2):
32
  pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
33
 
34
  # Install other required packages
35
- pip install transformers wikipedia-api beautifulsoup4 requests
36
  ```
37
 
38
  For other systems:
@@ -100,12 +101,24 @@ The `generate.py` script:
100
  - Generates text based on a user-provided prompt
101
  - Implements sampling strategies such as nucleus sampling and temperature scaling
102
 
 
 
 
 
 
 
 
 
 
 
 
103
  ## Files Description
104
 
105
  - `collect_data.py`: Collects articles from Azerbaijani Wikipedia using categories like history, culture, literature, and geography
106
  - `prepare_data.py`: Preprocesses text and trains a BPE tokenizer
107
  - `train.py`: Contains GPT model implementation and training loop
108
  - `generate.py`: Generates text using the trained model and sampling strategies
 
109
  - `az_wiki_data.json`: Collected and preprocessed Wikipedia articles
110
  - `az_tokenizer.json`: Trained BPE tokenizer for Azerbaijani text
111
  - `best_model.pt`: Saved state of the best model during training
 
12
  β”œβ”€β”€ collect_data.py # Script for collecting Wikipedia articles
13
  β”œβ”€β”€ generate.py # Text generation script using the trained model
14
  β”œβ”€β”€ prepare_data.py # Data preprocessing and tokenizer training
15
+ β”œβ”€β”€ push_to_hf.py # Script to upload the trained model to Hugging Face Model Hub
16
  β”œβ”€β”€ requirements.txt # Project dependencies
17
  └── train.py # GPT model training script
18
  ```
19
 
20
  ## Setup
21
 
22
+ 1. Create and activate a virtual environment:
23
  ```bash
24
  python -m venv .venv
25
  source .venv/bin/activate # On Windows: .venv\Scripts\activate
 
33
  pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
34
 
35
  # Install other required packages
36
+ pip install transformers wikipedia-api beautifulsoup4 requests huggingface_hub
37
  ```
38
 
39
  For other systems:
 
101
  - Generates text based on a user-provided prompt
102
  - Implements sampling strategies such as nucleus sampling and temperature scaling
103
 
104
+ ## Upload to Hugging Face Model Hub
105
+
106
+ Upload your trained model to the Hugging Face Model Hub:
107
+ ```bash
108
+ python push_to_hf.py
109
+ ```
110
+ The `push_to_hf.py` script:
111
+ - Authenticates with your Hugging Face account
112
+ - Creates a new repository for your model (if needed)
113
+ - Uploads the trained model, tokenizer, and any other relevant files
114
+
115
  ## Files Description
116
 
117
  - `collect_data.py`: Collects articles from Azerbaijani Wikipedia using categories like history, culture, literature, and geography
118
  - `prepare_data.py`: Preprocesses text and trains a BPE tokenizer
119
  - `train.py`: Contains GPT model implementation and training loop
120
  - `generate.py`: Generates text using the trained model and sampling strategies
121
+ - `push_to_hf.py`: Script for uploading the trained model to Hugging Face's Model Hub
122
  - `az_wiki_data.json`: Collected and preprocessed Wikipedia articles
123
  - `az_tokenizer.json`: Trained BPE tokenizer for Azerbaijani text
124
  - `best_model.pt`: Saved state of the best model during training