mrfakename commited on
Commit
aa59806
β€’
1 Parent(s): f514292

Sync from GitHub repo

Browse files

This Space is synced from the GitHub repo: https://github.com/SWivid/F5-TTS. Please submit contributions to the Space there

.github/ISSUE_TEMPLATE/bug_report.yml ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "Bug Report"
2
+ description: |
3
+ Please provide as much details to help address the issue, including logs and screenshots.
4
+ labels:
5
+ - bug
6
+ body:
7
+ - type: checkboxes
8
+ attributes:
9
+ label: Checks
10
+ description: "To ensure timely help, please confirm the following:"
11
+ options:
12
+ - label: This template is only for bug reports, usage problems go with 'Help Wanted'.
13
+ required: true
14
+ - label: I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
15
+ required: true
16
+ - label: I have searched for existing issues, including closed ones, and couldn't find a solution.
17
+ required: true
18
+ - label: I confirm that I am using English to submit this report in order to facilitate communication.
19
+ required: true
20
+ - type: textarea
21
+ attributes:
22
+ label: Environment Details
23
+ description: "Provide details such as OS, Python version, and any relevant software or dependencies."
24
+ placeholder: e.g., CentOS Linux 7, RTX 3090, Python 3.10, torch==2.3.0, cuda 11.8
25
+ validations:
26
+ required: true
27
+ - type: textarea
28
+ attributes:
29
+ label: Steps to Reproduce
30
+ description: |
31
+ Include detailed steps, screenshots, and logs. Use the correct markdown syntax for code blocks.
32
+ placeholder: |
33
+ 1. Create a new conda environment.
34
+ 2. Clone the repository, install as local editable and properly set up.
35
+ 3. Run the command: `accelerate launch src/f5_tts/train/train.py`.
36
+ 4. Have following error message... (attach logs).
37
+ validations:
38
+ required: true
39
+ - type: textarea
40
+ attributes:
41
+ label: βœ”οΈ Expected Behavior
42
+ placeholder: Describe what you expected to happen.
43
+ validations:
44
+ required: false
45
+ - type: textarea
46
+ attributes:
47
+ label: ❌ Actual Behavior
48
+ placeholder: Describe what actually happened.
49
+ validations:
50
+ required: false
.github/ISSUE_TEMPLATE/feature_request.yml ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "Feature Request"
2
+ description: |
3
+ Some constructive suggestions and new ideas regarding current repo.
4
+ labels:
5
+ - enhancement
6
+ body:
7
+ - type: checkboxes
8
+ attributes:
9
+ label: Checks
10
+ description: "To help us grasp quickly, please confirm the following:"
11
+ options:
12
+ - label: This template is only for feature request.
13
+ required: true
14
+ - label: I have thoroughly reviewed the project documentation but couldn't find any relevant information that meets my needs.
15
+ required: true
16
+ - label: I have searched for existing issues, including closed ones, and found not discussion yet.
17
+ required: true
18
+ - label: I confirm that I am using English to submit this report in order to facilitate communication.
19
+ required: true
20
+ - type: textarea
21
+ attributes:
22
+ label: 1. Is this request related to a challenge you're experiencing? Tell us your story.
23
+ description: |
24
+ Describe the specific problem or scenario you're facing in detail. For example:
25
+ *"I was trying to use [feature] for [specific task], but encountered [issue]. This was frustrating because...."*
26
+ placeholder: Please describe the situation in as much detail as possible.
27
+ validations:
28
+ required: true
29
+
30
+ - type: textarea
31
+ attributes:
32
+ label: 2. What is your suggested solution?
33
+ description: |
34
+ Provide a clear description of the feature or enhancement you'd like to propose.
35
+ How would this feature solve your issue or improve the project?
36
+ placeholder: Describe your idea or proposed solution here.
37
+ validations:
38
+ required: true
39
+
40
+ - type: textarea
41
+ attributes:
42
+ label: 3. Additional context or comments
43
+ description: |
44
+ Any other relevant information, links, documents, or screenshots that provide clarity.
45
+ Use this section for anything not covered above.
46
+ placeholder: Add any extra details here.
47
+ validations:
48
+ required: false
49
+
50
+ - type: checkboxes
51
+ attributes:
52
+ label: 4. Can you help us with this feature?
53
+ description: |
54
+ Let us know if you're interested in contributing. This is not a commitment but a way to express interest in collaboration.
55
+ options:
56
+ - label: I am interested in contributing to this feature.
57
+ required: false
58
+
59
+ - type: markdown
60
+ attributes:
61
+ value: |
62
+ **Note:** Please submit only one request per issue to keep discussions focused and manageable.
.github/ISSUE_TEMPLATE/help_wanted.yml ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "Help Wanted"
2
+ description: |
3
+ Please provide as much details to help address the issue, including logs and screenshots.
4
+ labels:
5
+ - help wanted
6
+ body:
7
+ - type: checkboxes
8
+ attributes:
9
+ label: Checks
10
+ description: "To ensure timely help, please confirm the following:"
11
+ options:
12
+ - label: This template is only for usage issues encountered.
13
+ required: true
14
+ - label: I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
15
+ required: true
16
+ - label: I have searched for existing issues, including closed ones, and couldn't find a solution.
17
+ required: true
18
+ - label: I confirm that I am using English to submit this report in order to facilitate communication.
19
+ required: true
20
+ - type: textarea
21
+ attributes:
22
+ label: Environment Details
23
+ description: "Provide details such as OS, Python version, and any relevant software or dependencies."
24
+ placeholder: e.g., macOS 13.5, Python 3.10, torch==2.3.0, Gradio 4.44.1
25
+ validations:
26
+ required: true
27
+ - type: textarea
28
+ attributes:
29
+ label: Steps to Reproduce
30
+ description: |
31
+ Include detailed steps, screenshots, and logs. Use the correct markdown syntax for code blocks.
32
+ placeholder: |
33
+ 1. Create a new conda environment.
34
+ 2. Clone the repository and install as pip package.
35
+ 3. Run the command: `f5-tts_infer-gradio` with no ref_text provided.
36
+ 4. Stuck there with the following message... (attach logs and also error msg e.g. after ctrl-c).
37
+ validations:
38
+ required: true
39
+ - type: textarea
40
+ attributes:
41
+ label: βœ”οΈ Expected Behavior
42
+ placeholder: Describe what you expected to happen, e.g. output a generated audio
43
+ validations:
44
+ required: false
45
+ - type: textarea
46
+ attributes:
47
+ label: ❌ Actual Behavior
48
+ placeholder: Describe what actually happened, failure messages, etc.
49
+ validations:
50
+ required: false
.github/ISSUE_TEMPLATE/question.yml ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "Question"
2
+ description: |
3
+ Pure question or inquiry about the project, usage issue goes with "help wanted".
4
+ labels:
5
+ - question
6
+ body:
7
+ - type: checkboxes
8
+ attributes:
9
+ label: Checks
10
+ description: "To help us grasp quickly, please confirm the following:"
11
+ options:
12
+ - label: This template is only for question, not feature requests or bug reports.
13
+ required: true
14
+ - label: I have thoroughly reviewed the project documentation and read the related paper(s).
15
+ required: true
16
+ - label: I have searched for existing issues, including closed ones, no similar questions.
17
+ required: true
18
+ - label: I confirm that I am using English to submit this report in order to facilitate communication.
19
+ required: true
20
+ - type: textarea
21
+ attributes:
22
+ label: Question details
23
+ description: |
24
+ Question details, clearly stated using proper markdown syntax.
25
+ validations:
26
+ required: true
src/f5_tts/infer/README.md CHANGED
@@ -113,4 +113,77 @@ To test speech editing capabilities, use the following command:
113
 
114
  ```bash
115
  python src/f5_tts/infer/speech_edit.py
116
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ```bash
115
  python src/f5_tts/infer/speech_edit.py
116
+ ```
117
+
118
+ ## Socket Realtime Client
119
+
120
+ To communicate with socket server you need to run
121
+ ```bash
122
+ python src/f5_tts/socket_server.py
123
+ ```
124
+
125
+ <details>
126
+ <summary>Then create client to communicate</summary>
127
+
128
+ ``` python
129
+ import socket
130
+ import numpy as np
131
+ import asyncio
132
+ import pyaudio
133
+
134
+ async def listen_to_voice(text, server_ip='localhost', server_port=9999):
135
+ client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
136
+ client_socket.connect((server_ip, server_port))
137
+
138
+ async def play_audio_stream():
139
+ buffer = b''
140
+ p = pyaudio.PyAudio()
141
+ stream = p.open(format=pyaudio.paFloat32,
142
+ channels=1,
143
+ rate=24000, # Ensure this matches the server's sampling rate
144
+ output=True,
145
+ frames_per_buffer=2048)
146
+
147
+ try:
148
+ while True:
149
+ chunk = await asyncio.get_event_loop().run_in_executor(None, client_socket.recv, 1024)
150
+ if not chunk: # End of stream
151
+ break
152
+ if b"END_OF_AUDIO" in chunk:
153
+ buffer += chunk.replace(b"END_OF_AUDIO", b"")
154
+ if buffer:
155
+ audio_array = np.frombuffer(buffer, dtype=np.float32).copy() # Make a writable copy
156
+ stream.write(audio_array.tobytes())
157
+ break
158
+ buffer += chunk
159
+ if len(buffer) >= 4096:
160
+ audio_array = np.frombuffer(buffer[:4096], dtype=np.float32).copy() # Make a writable copy
161
+ stream.write(audio_array.tobytes())
162
+ buffer = buffer[4096:]
163
+ finally:
164
+ stream.stop_stream()
165
+ stream.close()
166
+ p.terminate()
167
+
168
+ try:
169
+ # Send only the text to the server
170
+ await asyncio.get_event_loop().run_in_executor(None, client_socket.sendall, text.encode('utf-8'))
171
+ await play_audio_stream()
172
+ print("Audio playback finished.")
173
+
174
+ except Exception as e:
175
+ print(f"Error in listen_to_voice: {e}")
176
+
177
+ finally:
178
+ client_socket.close()
179
+
180
+ # Example usage: Replace this with your actual server IP and port
181
+ async def main():
182
+ await listen_to_voice("my name is jenny..", server_ip='localhost', server_port=9998)
183
+
184
+ # Run the main async function
185
+ asyncio.run(main())
186
+ ```
187
+
188
+ </details>
189
+
src/f5_tts/infer/utils_infer.py CHANGED
@@ -218,6 +218,22 @@ def load_model(
218
  return model
219
 
220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  # preprocess reference audio and text
222
 
223
 
@@ -229,7 +245,7 @@ def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_in
229
  if clip_short:
230
  # 1. try to find long silence for clipping
231
  non_silent_segs = silence.split_on_silence(
232
- aseg, min_silence_len=1000, silence_thresh=-50, keep_silence=1000
233
  )
234
  non_silent_wave = AudioSegment.silent(duration=0)
235
  for non_silent_seg in non_silent_segs:
@@ -241,7 +257,7 @@ def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_in
241
  # 2. try to find short silence for clipping if 1. failed
242
  if len(non_silent_wave) > 15000:
243
  non_silent_segs = silence.split_on_silence(
244
- aseg, min_silence_len=100, silence_thresh=-40, keep_silence=1000
245
  )
246
  non_silent_wave = AudioSegment.silent(duration=0)
247
  for non_silent_seg in non_silent_segs:
@@ -257,6 +273,7 @@ def preprocess_ref_audio_text(ref_audio_orig, ref_text, clip_short=True, show_in
257
  aseg = aseg[:15000]
258
  show_info("Audio is over 15s, clipping short. (3)")
259
 
 
260
  aseg.export(f.name, format="wav")
261
  ref_audio = f.name
262
 
@@ -473,7 +490,9 @@ def infer_batch_process(
473
 
474
  def remove_silence_for_generated_wav(filename):
475
  aseg = AudioSegment.from_file(filename)
476
- non_silent_segs = silence.split_on_silence(aseg, min_silence_len=1000, silence_thresh=-50, keep_silence=500)
 
 
477
  non_silent_wave = AudioSegment.silent(duration=0)
478
  for non_silent_seg in non_silent_segs:
479
  non_silent_wave += non_silent_seg
 
218
  return model
219
 
220
 
221
+ def remove_silence_edges(audio, silence_threshold=-42):
222
+ # Remove silence from the start
223
+ non_silent_start_idx = silence.detect_leading_silence(audio, silence_threshold=silence_threshold)
224
+ audio = audio[non_silent_start_idx:]
225
+
226
+ # Remove silence from the end
227
+ non_silent_end_duration = audio.duration_seconds
228
+ for ms in reversed(audio):
229
+ if ms.dBFS > silence_threshold:
230
+ break
231
+ non_silent_end_duration -= 0.001
232
+ trimmed_audio = audio[: int(non_silent_end_duration * 1000)]
233
+
234
+ return trimmed_audio
235
+
236
+
237
  # preprocess reference audio and text
238
 
239
 
 
245
  if clip_short:
246
  # 1. try to find long silence for clipping
247
  non_silent_segs = silence.split_on_silence(
248
+ aseg, min_silence_len=1000, silence_thresh=-50, keep_silence=1000, seek_step=10
249
  )
250
  non_silent_wave = AudioSegment.silent(duration=0)
251
  for non_silent_seg in non_silent_segs:
 
257
  # 2. try to find short silence for clipping if 1. failed
258
  if len(non_silent_wave) > 15000:
259
  non_silent_segs = silence.split_on_silence(
260
+ aseg, min_silence_len=100, silence_thresh=-40, keep_silence=1000, seek_step=10
261
  )
262
  non_silent_wave = AudioSegment.silent(duration=0)
263
  for non_silent_seg in non_silent_segs:
 
273
  aseg = aseg[:15000]
274
  show_info("Audio is over 15s, clipping short. (3)")
275
 
276
+ aseg = remove_silence_edges(aseg) + AudioSegment.silent(duration=50)
277
  aseg.export(f.name, format="wav")
278
  ref_audio = f.name
279
 
 
490
 
491
  def remove_silence_for_generated_wav(filename):
492
  aseg = AudioSegment.from_file(filename)
493
+ non_silent_segs = silence.split_on_silence(
494
+ aseg, min_silence_len=1000, silence_thresh=-50, keep_silence=500, seek_step=10
495
+ )
496
  non_silent_wave = AudioSegment.silent(duration=0)
497
  for non_silent_seg in non_silent_segs:
498
  non_silent_wave += non_silent_seg
src/f5_tts/socket_server.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import socket
2
+ import struct
3
+ import torch
4
+ import torchaudio
5
+ from threading import Thread
6
+
7
+
8
+ import gc
9
+ import traceback
10
+
11
+
12
+ from infer.utils_infer import infer_batch_process, preprocess_ref_audio_text, load_vocoder, load_model
13
+ from model.backbones.dit import DiT
14
+
15
+
16
+ class TTSStreamingProcessor:
17
+ def __init__(self, ckpt_file, vocab_file, ref_audio, ref_text, device=None, dtype=torch.float32):
18
+ self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
19
+
20
+ # Load the model using the provided checkpoint and vocab files
21
+ self.model = load_model(
22
+ model_cls=DiT,
23
+ model_cfg=dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4),
24
+ ckpt_path=ckpt_file,
25
+ mel_spec_type="vocos", # or "bigvgan" depending on vocoder
26
+ vocab_file=vocab_file,
27
+ ode_method="euler",
28
+ use_ema=True,
29
+ device=self.device,
30
+ ).to(self.device, dtype=dtype)
31
+
32
+ # Load the vocoder
33
+ self.vocoder = load_vocoder(is_local=False)
34
+
35
+ # Set sampling rate for streaming
36
+ self.sampling_rate = 24000 # Consistency with client
37
+
38
+ # Set reference audio and text
39
+ self.ref_audio = ref_audio
40
+ self.ref_text = ref_text
41
+
42
+ # Warm up the model
43
+ self._warm_up()
44
+
45
+ def _warm_up(self):
46
+ """Warm up the model with a dummy input to ensure it's ready for real-time processing."""
47
+ print("Warming up the model...")
48
+ ref_audio, ref_text = preprocess_ref_audio_text(self.ref_audio, self.ref_text)
49
+ audio, sr = torchaudio.load(ref_audio)
50
+ gen_text = "Warm-up text for the model."
51
+
52
+ # Pass the vocoder as an argument here
53
+ infer_batch_process((audio, sr), ref_text, [gen_text], self.model, self.vocoder, device=self.device)
54
+ print("Warm-up completed.")
55
+
56
+ def generate_stream(self, text, play_steps_in_s=0.5):
57
+ """Generate audio in chunks and yield them in real-time."""
58
+ # Preprocess the reference audio and text
59
+ ref_audio, ref_text = preprocess_ref_audio_text(self.ref_audio, self.ref_text)
60
+
61
+ # Load reference audio
62
+ audio, sr = torchaudio.load(ref_audio)
63
+
64
+ # Run inference for the input text
65
+ audio_chunk, final_sample_rate, _ = infer_batch_process(
66
+ (audio, sr),
67
+ ref_text,
68
+ [text],
69
+ self.model,
70
+ self.vocoder,
71
+ device=self.device, # Pass vocoder here
72
+ )
73
+
74
+ # Break the generated audio into chunks and send them
75
+ chunk_size = int(final_sample_rate * play_steps_in_s)
76
+
77
+ for i in range(0, len(audio_chunk), chunk_size):
78
+ chunk = audio_chunk[i : i + chunk_size]
79
+
80
+ # Check if it's the final chunk
81
+ if i + chunk_size >= len(audio_chunk):
82
+ chunk = audio_chunk[i:]
83
+
84
+ # Avoid sending empty or repeated chunks
85
+ if len(chunk) == 0:
86
+ break
87
+
88
+ # Pack and send the audio chunk
89
+ packed_audio = struct.pack(f"{len(chunk)}f", *chunk)
90
+ yield packed_audio
91
+
92
+ # Ensure that no final word is repeated by not resending partial chunks
93
+ if len(audio_chunk) % chunk_size != 0:
94
+ remaining_chunk = audio_chunk[-(len(audio_chunk) % chunk_size) :]
95
+ packed_audio = struct.pack(f"{len(remaining_chunk)}f", *remaining_chunk)
96
+ yield packed_audio
97
+
98
+
99
+ def handle_client(client_socket, processor):
100
+ try:
101
+ while True:
102
+ # Receive data from the client
103
+ data = client_socket.recv(1024).decode("utf-8")
104
+ if not data:
105
+ break
106
+
107
+ try:
108
+ # The client sends the text input
109
+ text = data.strip()
110
+
111
+ # Generate and stream audio chunks
112
+ for audio_chunk in processor.generate_stream(text):
113
+ client_socket.sendall(audio_chunk)
114
+
115
+ # Send end-of-audio signal
116
+ client_socket.sendall(b"END_OF_AUDIO")
117
+
118
+ except Exception as inner_e:
119
+ print(f"Error during processing: {inner_e}")
120
+ traceback.print_exc() # Print the full traceback to diagnose the issue
121
+ break
122
+
123
+ except Exception as e:
124
+ print(f"Error handling client: {e}")
125
+ traceback.print_exc()
126
+ finally:
127
+ client_socket.close()
128
+
129
+
130
+ def start_server(host, port, processor):
131
+ server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
132
+ server.bind((host, port))
133
+ server.listen(5)
134
+ print(f"Server listening on {host}:{port}")
135
+
136
+ while True:
137
+ client_socket, addr = server.accept()
138
+ print(f"Accepted connection from {addr}")
139
+ client_handler = Thread(target=handle_client, args=(client_socket, processor))
140
+ client_handler.start()
141
+
142
+
143
+ if __name__ == "__main__":
144
+ try:
145
+ # Load the model and vocoder using the provided files
146
+ ckpt_file = "" # pointing your checkpoint "ckpts/model/model_1096.pt"
147
+ vocab_file = "" # Add vocab file path if needed
148
+ ref_audio = "" # add ref audio"./tests/ref_audio/reference.wav"
149
+ ref_text = ""
150
+
151
+ # Initialize the processor with the model and vocoder
152
+ processor = TTSStreamingProcessor(
153
+ ckpt_file=ckpt_file,
154
+ vocab_file=vocab_file,
155
+ ref_audio=ref_audio,
156
+ ref_text=ref_text,
157
+ dtype=torch.float32,
158
+ )
159
+
160
+ # Start the server
161
+ start_server("0.0.0.0", 9998, processor)
162
+ except KeyboardInterrupt:
163
+ gc.collect()
src/f5_tts/train/finetune_cli.py CHANGED
@@ -55,7 +55,6 @@ def parse_args():
55
  default=None,
56
  help="Path to custom tokenizer vocab file (only used if tokenizer = 'custom')",
57
  )
58
-
59
  parser.add_argument(
60
  "--log_samples",
61
  type=bool,
@@ -63,6 +62,12 @@ def parse_args():
63
  help="Log inferenced samples per ckpt save steps",
64
  )
65
  parser.add_argument("--logger", type=str, default=None, choices=["wandb", "tensorboard"], help="logger")
 
 
 
 
 
 
66
 
67
  return parser.parse_args()
68
 
@@ -147,6 +152,7 @@ def main():
147
  wandb_resume_id=wandb_resume_id,
148
  log_samples=args.log_samples,
149
  last_per_steps=args.last_per_steps,
 
150
  )
151
 
152
  train_dataset = load_dataset(args.dataset_name, tokenizer, mel_spec_kwargs=mel_spec_kwargs)
 
55
  default=None,
56
  help="Path to custom tokenizer vocab file (only used if tokenizer = 'custom')",
57
  )
 
58
  parser.add_argument(
59
  "--log_samples",
60
  type=bool,
 
62
  help="Log inferenced samples per ckpt save steps",
63
  )
64
  parser.add_argument("--logger", type=str, default=None, choices=["wandb", "tensorboard"], help="logger")
65
+ parser.add_argument(
66
+ "--bnb_optimizer",
67
+ type=bool,
68
+ default=False,
69
+ help="Use 8-bit Adam optimizer from bitsandbytes",
70
+ )
71
 
72
  return parser.parse_args()
73
 
 
152
  wandb_resume_id=wandb_resume_id,
153
  log_samples=args.log_samples,
154
  last_per_steps=args.last_per_steps,
155
+ bnb_optimizer=args.bnb_optimizer,
156
  )
157
 
158
  train_dataset = load_dataset(args.dataset_name, tokenizer, mel_spec_kwargs=mel_spec_kwargs)
src/f5_tts/train/finetune_gradio.py CHANGED
@@ -1372,7 +1372,7 @@ def get_audio_select(file_sample):
1372
  with gr.Blocks() as app:
1373
  gr.Markdown(
1374
  """
1375
- # E2/F5 TTS AUTOMATIC FINETUNE
1376
 
1377
  This is a local web UI for F5 TTS with advanced batch processing support. This app supports the following TTS models:
1378
 
@@ -1381,35 +1381,35 @@ This is a local web UI for F5 TTS with advanced batch processing support. This a
1381
 
1382
  The checkpoints support English and Chinese.
1383
 
1384
- for tutorial and updates check here (https://github.com/SWivid/F5-TTS/discussions/143)
1385
  """
1386
  )
1387
 
1388
  with gr.Row():
1389
  projects, projects_selelect = get_list_projects()
1390
- tokenizer_type = gr.Radio(label="Tokenizer Type", choices=["pinyin", "char"], value="pinyin")
1391
- project_name = gr.Textbox(label="project name", value="my_speak")
1392
- bt_create = gr.Button("create new project")
1393
 
1394
  with gr.Row():
1395
  cm_project = gr.Dropdown(
1396
  choices=projects, value=projects_selelect, label="Project", allow_custom_value=True, scale=6
1397
  )
1398
- ch_refresh_project = gr.Button("refresh", scale=1)
1399
 
1400
  bt_create.click(fn=create_data_project, inputs=[project_name, tokenizer_type], outputs=[cm_project])
1401
 
1402
  with gr.Tabs():
1403
- with gr.TabItem("transcribe Data"):
1404
  gr.Markdown("""```plaintext
1405
  Skip this step if you have your dataset, metadata.csv, and a folder wavs with all the audio files.
1406
  ```""")
1407
 
1408
- ch_manual = gr.Checkbox(label="audio from path", value=False)
1409
 
1410
  mark_info_transcribe = gr.Markdown(
1411
  """```plaintext
1412
- Place your 'wavs' folder and 'metadata.csv' file in the {your_project_name}' directory.
1413
 
1414
  my_speak/
1415
  β”‚
@@ -1421,10 +1421,10 @@ Skip this step if you have your dataset, metadata.csv, and a folder wavs with al
1421
  visible=False,
1422
  )
1423
 
1424
- audio_speaker = gr.File(label="voice", type="filepath", file_count="multiple")
1425
- txt_lang = gr.Text(label="Language", value="english")
1426
- bt_transcribe = bt_create = gr.Button("transcribe")
1427
- txt_info_transcribe = gr.Text(label="info", value="")
1428
  bt_transcribe.click(
1429
  fn=transcribe_all,
1430
  inputs=[cm_project, audio_speaker, txt_lang, ch_manual],
@@ -1432,7 +1432,7 @@ Skip this step if you have your dataset, metadata.csv, and a folder wavs with al
1432
  )
1433
  ch_manual.change(fn=check_user, inputs=[ch_manual], outputs=[audio_speaker, mark_info_transcribe])
1434
 
1435
- random_sample_transcribe = gr.Button("random sample")
1436
 
1437
  with gr.Row():
1438
  random_text_transcribe = gr.Text(label="Text")
@@ -1444,16 +1444,16 @@ Skip this step if you have your dataset, metadata.csv, and a folder wavs with al
1444
  outputs=[random_text_transcribe, random_audio_transcribe],
1445
  )
1446
 
1447
- with gr.TabItem("vocab check"):
1448
  gr.Markdown("""```plaintext
1449
- check the vocabulary for fine-tuning Emilia_ZH_EN to ensure all symbols are included. for finetune new language
1450
  ```""")
1451
 
1452
- check_button = gr.Button("check vocab")
1453
- txt_info_check = gr.Text(label="info", value="")
1454
 
1455
  gr.Markdown("""```plaintext
1456
- Using the extended model, you can fine-tune to a new language that is missing symbols in the vocab , this create a new model with a new vocabulary size and save it in your ckpts/project folder.
1457
  ```""")
1458
 
1459
  exp_name_extend = gr.Radio(label="Model", choices=["F5-TTS", "E2-TTS"], value="F5-TTS")
@@ -1465,10 +1465,10 @@ Using the extended model, you can fine-tune to a new language that is missing sy
1465
  placeholder="To add new symbols, make sure to use ',' for each symbol",
1466
  scale=6,
1467
  )
1468
- txt_count_symbol = gr.Textbox(label="new size vocab", value="", scale=1)
1469
 
1470
- extend_button = gr.Button("Extended")
1471
- txt_info_extend = gr.Text(label="info", value="")
1472
 
1473
  txt_extend.change(vocab_count, inputs=[txt_extend], outputs=[txt_count_symbol])
1474
  check_button.click(fn=vocab_check, inputs=[cm_project], outputs=[txt_info_check, txt_extend])
@@ -1476,18 +1476,18 @@ Using the extended model, you can fine-tune to a new language that is missing sy
1476
  fn=vocab_extend, inputs=[cm_project, txt_extend, exp_name_extend], outputs=[txt_info_extend]
1477
  )
1478
 
1479
- with gr.TabItem("prepare Data"):
1480
  gr.Markdown("""```plaintext
1481
- Skip this step if you have your dataset, raw.arrow , duraction.json and vocab.txt
1482
  ```""")
1483
 
1484
  gr.Markdown(
1485
  """```plaintext
1486
- place all your wavs folder and your metadata.csv file in {your name project}
1487
 
1488
- suport format for audio "wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr"
1489
 
1490
- example wav format
1491
  my_speak/
1492
  β”‚
1493
  β”œβ”€β”€ wavs/
@@ -1497,24 +1497,24 @@ Skip this step if you have your dataset, raw.arrow , duraction.json and vocab.tx
1497
  β”‚
1498
  └── metadata.csv
1499
 
1500
- file format metadata.csv
1501
 
1502
  audio1|text1 or audio1.wav|text1 or your_path/audio1.wav|text1
1503
- audio2|text1 or audio2.wav|text1 or your_path/audio1.wav|text1
1504
  ...
1505
 
1506
  ```"""
1507
  )
1508
- ch_tokenizern = gr.Checkbox(label="create vocabulary", value=False, visible=False)
1509
- bt_prepare = bt_create = gr.Button("prepare")
1510
- txt_info_prepare = gr.Text(label="info", value="")
1511
- txt_vocab_prepare = gr.Text(label="vocab", value="")
1512
 
1513
  bt_prepare.click(
1514
  fn=create_metadata, inputs=[cm_project, ch_tokenizern], outputs=[txt_info_prepare, txt_vocab_prepare]
1515
  )
1516
 
1517
- random_sample_prepare = gr.Button("random sample")
1518
 
1519
  with gr.Row():
1520
  random_text_prepare = gr.Text(label="Tokenizer")
@@ -1524,20 +1524,20 @@ Skip this step if you have your dataset, raw.arrow , duraction.json and vocab.tx
1524
  fn=get_random_sample_prepare, inputs=[cm_project], outputs=[random_text_prepare, random_audio_prepare]
1525
  )
1526
 
1527
- with gr.TabItem("train Data"):
1528
  gr.Markdown("""```plaintext
1529
- The auto-setting is still experimental. Please make sure that the epochs , save per updates , and last per steps are set correctly, or change them manually as needed.
1530
  If you encounter a memory error, try reducing the batch size per GPU to a smaller number.
1531
  ```""")
1532
  with gr.Row():
1533
  bt_calculate = bt_create = gr.Button("Auto Settings")
1534
- lb_samples = gr.Label(label="samples")
1535
  batch_size_type = gr.Radio(label="Batch Size Type", choices=["frame", "sample"], value="frame")
1536
 
1537
  with gr.Row():
1538
- ch_finetune = bt_create = gr.Checkbox(label="finetune", value=True)
1539
  tokenizer_file = gr.Textbox(label="Tokenizer File", value="")
1540
- file_checkpoint_train = gr.Textbox(label="Path to the preetrain checkpoint ", value="")
1541
 
1542
  with gr.Row():
1543
  exp_name = gr.Radio(label="Model", choices=["F5TTS_Base", "E2TTS_Base"], value="F5TTS_Base")
@@ -1603,8 +1603,8 @@ If you encounter a memory error, try reducing the batch size per GPU to a smalle
1603
  mixed_precision.value = mixed_precisionv
1604
  cd_logger.value = cd_loggerv
1605
 
1606
- ch_stream = gr.Checkbox(label="stream output experiment.", value=True)
1607
- txt_info_train = gr.Text(label="info", value="")
1608
 
1609
  list_audios, select_audio = get_audio_project(projects_selelect, False)
1610
 
@@ -1619,18 +1619,18 @@ If you encounter a memory error, try reducing the batch size per GPU to a smalle
1619
  ch_list_audio = gr.Dropdown(
1620
  choices=list_audios,
1621
  value=select_audio,
1622
- label="audios",
1623
  allow_custom_value=True,
1624
  scale=6,
1625
  interactive=True,
1626
  )
1627
- bt_stream_audio = gr.Button("refresh", scale=1)
1628
  bt_stream_audio.click(fn=get_audio_project, inputs=[cm_project], outputs=[ch_list_audio])
1629
  cm_project.change(fn=get_audio_project, inputs=[cm_project], outputs=[ch_list_audio])
1630
 
1631
  with gr.Row():
1632
- audio_ref_stream = gr.Audio(label="original", type="filepath", value=select_audio_ref)
1633
- audio_gen_stream = gr.Audio(label="generate", type="filepath", value=select_audio_gen)
1634
 
1635
  ch_list_audio.change(
1636
  fn=get_audio_select,
@@ -1730,36 +1730,36 @@ If you encounter a memory error, try reducing the batch size per GPU to a smalle
1730
  outputs=outputs,
1731
  )
1732
 
1733
- with gr.TabItem("test model"):
1734
  gr.Markdown("""```plaintext
1735
- SOS : check the use_ema setting (True or False) for your model to see what works best for you.
1736
  ```""")
1737
  exp_name = gr.Radio(label="Model", choices=["F5-TTS", "E2-TTS"], value="F5-TTS")
1738
  list_checkpoints, checkpoint_select = get_checkpoints_project(projects_selelect, False)
1739
 
1740
- nfe_step = gr.Number(label="n_step", value=32)
1741
- ch_use_ema = gr.Checkbox(label="use ema", value=True)
1742
  with gr.Row():
1743
  cm_checkpoint = gr.Dropdown(
1744
- choices=list_checkpoints, value=checkpoint_select, label="checkpoints", allow_custom_value=True
1745
  )
1746
- bt_checkpoint_refresh = gr.Button("refresh")
1747
 
1748
- random_sample_infer = gr.Button("random sample")
1749
 
1750
- ref_text = gr.Textbox(label="ref text")
1751
- ref_audio = gr.Audio(label="audio ref", type="filepath")
1752
- gen_text = gr.Textbox(label="gen text")
1753
 
1754
  random_sample_infer.click(
1755
  fn=get_random_sample_infer, inputs=[cm_project], outputs=[ref_text, gen_text, ref_audio]
1756
  )
1757
 
1758
  with gr.Row():
1759
- txt_info_gpu = gr.Textbox("", label="device")
1760
- check_button_infer = gr.Button("infer")
1761
 
1762
- gen_audio = gr.Audio(label="audio gen", type="filepath")
1763
 
1764
  check_button_infer.click(
1765
  fn=infer,
@@ -1770,22 +1770,22 @@ SOS : check the use_ema setting (True or False) for your model to see what works
1770
  bt_checkpoint_refresh.click(fn=get_checkpoints_project, inputs=[cm_project], outputs=[cm_checkpoint])
1771
  cm_project.change(fn=get_checkpoints_project, inputs=[cm_project], outputs=[cm_checkpoint])
1772
 
1773
- with gr.TabItem("reduse checkpoint"):
1774
  gr.Markdown("""```plaintext
1775
- Reduce the model size from 5GB to 1.3GB. The new checkpoint can be used for inference or fine-tuning afterward, but it cannot be used to continue training..
1776
  ```""")
1777
- txt_path_checkpoint = gr.Text(label="path checkpoint :")
1778
- txt_path_checkpoint_small = gr.Text(label="path output :")
1779
- ch_safetensors = gr.Checkbox(label="safetensors", value="")
1780
- txt_info_reduse = gr.Text(label="info", value="")
1781
- reduse_button = gr.Button("reduse")
1782
  reduse_button.click(
1783
  fn=extract_and_save_ema_model,
1784
  inputs=[txt_path_checkpoint, txt_path_checkpoint_small, ch_safetensors],
1785
  outputs=[txt_info_reduse],
1786
  )
1787
 
1788
- with gr.TabItem("system info"):
1789
  output_box = gr.Textbox(label="GPU and CPU Information", lines=20)
1790
 
1791
  def update_stats():
 
1372
  with gr.Blocks() as app:
1373
  gr.Markdown(
1374
  """
1375
+ # E2/F5 TTS Automatic Finetune
1376
 
1377
  This is a local web UI for F5 TTS with advanced batch processing support. This app supports the following TTS models:
1378
 
 
1381
 
1382
  The checkpoints support English and Chinese.
1383
 
1384
+ For tutorial and updates check here (https://github.com/SWivid/F5-TTS/discussions/143)
1385
  """
1386
  )
1387
 
1388
  with gr.Row():
1389
  projects, projects_selelect = get_list_projects()
1390
+ tokenizer_type = gr.Radio(label="Tokenizer Type", choices=["pinyin", "char", "custom"], value="pinyin")
1391
+ project_name = gr.Textbox(label="Project Name", value="my_speak")
1392
+ bt_create = gr.Button("Create a New Project")
1393
 
1394
  with gr.Row():
1395
  cm_project = gr.Dropdown(
1396
  choices=projects, value=projects_selelect, label="Project", allow_custom_value=True, scale=6
1397
  )
1398
+ ch_refresh_project = gr.Button("Refresh", scale=1)
1399
 
1400
  bt_create.click(fn=create_data_project, inputs=[project_name, tokenizer_type], outputs=[cm_project])
1401
 
1402
  with gr.Tabs():
1403
+ with gr.TabItem("Transcribe Data"):
1404
  gr.Markdown("""```plaintext
1405
  Skip this step if you have your dataset, metadata.csv, and a folder wavs with all the audio files.
1406
  ```""")
1407
 
1408
+ ch_manual = gr.Checkbox(label="Audio from Path", value=False)
1409
 
1410
  mark_info_transcribe = gr.Markdown(
1411
  """```plaintext
1412
+ Place your 'wavs' folder and 'metadata.csv' file in the '{your_project_name}' directory.
1413
 
1414
  my_speak/
1415
  β”‚
 
1421
  visible=False,
1422
  )
1423
 
1424
+ audio_speaker = gr.File(label="Voice", type="filepath", file_count="multiple")
1425
+ txt_lang = gr.Text(label="Language", value="English")
1426
+ bt_transcribe = bt_create = gr.Button("Transcribe")
1427
+ txt_info_transcribe = gr.Text(label="Info", value="")
1428
  bt_transcribe.click(
1429
  fn=transcribe_all,
1430
  inputs=[cm_project, audio_speaker, txt_lang, ch_manual],
 
1432
  )
1433
  ch_manual.change(fn=check_user, inputs=[ch_manual], outputs=[audio_speaker, mark_info_transcribe])
1434
 
1435
+ random_sample_transcribe = gr.Button("Random Sample")
1436
 
1437
  with gr.Row():
1438
  random_text_transcribe = gr.Text(label="Text")
 
1444
  outputs=[random_text_transcribe, random_audio_transcribe],
1445
  )
1446
 
1447
+ with gr.TabItem("Vocab Check"):
1448
  gr.Markdown("""```plaintext
1449
+ Check the vocabulary for fine-tuning Emilia_ZH_EN to ensure all symbols are included. For fine-tuning a new language.
1450
  ```""")
1451
 
1452
+ check_button = gr.Button("Check Vocab")
1453
+ txt_info_check = gr.Text(label="Info", value="")
1454
 
1455
  gr.Markdown("""```plaintext
1456
+ Using the extended model, you can finetune to a new language that is missing symbols in the vocab. This creates a new model with a new vocabulary size and saves it in your ckpts/project folder.
1457
  ```""")
1458
 
1459
  exp_name_extend = gr.Radio(label="Model", choices=["F5-TTS", "E2-TTS"], value="F5-TTS")
 
1465
  placeholder="To add new symbols, make sure to use ',' for each symbol",
1466
  scale=6,
1467
  )
1468
+ txt_count_symbol = gr.Textbox(label="New Vocab Size", value="", scale=1)
1469
 
1470
+ extend_button = gr.Button("Extend")
1471
+ txt_info_extend = gr.Text(label="Info", value="")
1472
 
1473
  txt_extend.change(vocab_count, inputs=[txt_extend], outputs=[txt_count_symbol])
1474
  check_button.click(fn=vocab_check, inputs=[cm_project], outputs=[txt_info_check, txt_extend])
 
1476
  fn=vocab_extend, inputs=[cm_project, txt_extend, exp_name_extend], outputs=[txt_info_extend]
1477
  )
1478
 
1479
+ with gr.TabItem("Prepare Data"):
1480
  gr.Markdown("""```plaintext
1481
+ Skip this step if you have your dataset, raw.arrow, duration.json, and vocab.txt
1482
  ```""")
1483
 
1484
  gr.Markdown(
1485
  """```plaintext
1486
+ Place all your "wavs" folder and your "metadata.csv" file in your project name directory.
1487
 
1488
+ Supported audio formats: "wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr"
1489
 
1490
+ Example wav format:
1491
  my_speak/
1492
  β”‚
1493
  β”œβ”€β”€ wavs/
 
1497
  β”‚
1498
  └── metadata.csv
1499
 
1500
+ File format metadata.csv:
1501
 
1502
  audio1|text1 or audio1.wav|text1 or your_path/audio1.wav|text1
1503
+ audio2|text1 or audio2.wav|text1 or your_path/audio2.wav|text1
1504
  ...
1505
 
1506
  ```"""
1507
  )
1508
+ ch_tokenizern = gr.Checkbox(label="Create Vocabulary", value=False, visible=False)
1509
+ bt_prepare = bt_create = gr.Button("Prepare")
1510
+ txt_info_prepare = gr.Text(label="Info", value="")
1511
+ txt_vocab_prepare = gr.Text(label="Vocab", value="")
1512
 
1513
  bt_prepare.click(
1514
  fn=create_metadata, inputs=[cm_project, ch_tokenizern], outputs=[txt_info_prepare, txt_vocab_prepare]
1515
  )
1516
 
1517
+ random_sample_prepare = gr.Button("Random Sample")
1518
 
1519
  with gr.Row():
1520
  random_text_prepare = gr.Text(label="Tokenizer")
 
1524
  fn=get_random_sample_prepare, inputs=[cm_project], outputs=[random_text_prepare, random_audio_prepare]
1525
  )
1526
 
1527
+ with gr.TabItem("Train Data"):
1528
  gr.Markdown("""```plaintext
1529
+ The auto-setting is still experimental. Please make sure that the epochs, save per updates, and last per steps are set correctly, or change them manually as needed.
1530
  If you encounter a memory error, try reducing the batch size per GPU to a smaller number.
1531
  ```""")
1532
  with gr.Row():
1533
  bt_calculate = bt_create = gr.Button("Auto Settings")
1534
+ lb_samples = gr.Label(label="Samples")
1535
  batch_size_type = gr.Radio(label="Batch Size Type", choices=["frame", "sample"], value="frame")
1536
 
1537
  with gr.Row():
1538
+ ch_finetune = bt_create = gr.Checkbox(label="Finetune", value=True)
1539
  tokenizer_file = gr.Textbox(label="Tokenizer File", value="")
1540
+ file_checkpoint_train = gr.Textbox(label="Path to the Pretrained Checkpoint", value="")
1541
 
1542
  with gr.Row():
1543
  exp_name = gr.Radio(label="Model", choices=["F5TTS_Base", "E2TTS_Base"], value="F5TTS_Base")
 
1603
  mixed_precision.value = mixed_precisionv
1604
  cd_logger.value = cd_loggerv
1605
 
1606
+ ch_stream = gr.Checkbox(label="Stream Output Experiment", value=True)
1607
+ txt_info_train = gr.Text(label="Info", value="")
1608
 
1609
  list_audios, select_audio = get_audio_project(projects_selelect, False)
1610
 
 
1619
  ch_list_audio = gr.Dropdown(
1620
  choices=list_audios,
1621
  value=select_audio,
1622
+ label="Audios",
1623
  allow_custom_value=True,
1624
  scale=6,
1625
  interactive=True,
1626
  )
1627
+ bt_stream_audio = gr.Button("Refresh", scale=1)
1628
  bt_stream_audio.click(fn=get_audio_project, inputs=[cm_project], outputs=[ch_list_audio])
1629
  cm_project.change(fn=get_audio_project, inputs=[cm_project], outputs=[ch_list_audio])
1630
 
1631
  with gr.Row():
1632
+ audio_ref_stream = gr.Audio(label="Original", type="filepath", value=select_audio_ref)
1633
+ audio_gen_stream = gr.Audio(label="Generate", type="filepath", value=select_audio_gen)
1634
 
1635
  ch_list_audio.change(
1636
  fn=get_audio_select,
 
1730
  outputs=outputs,
1731
  )
1732
 
1733
+ with gr.TabItem("Test Model"):
1734
  gr.Markdown("""```plaintext
1735
+ SOS: Check the use_ema setting (True or False) for your model to see what works best for you.
1736
  ```""")
1737
  exp_name = gr.Radio(label="Model", choices=["F5-TTS", "E2-TTS"], value="F5-TTS")
1738
  list_checkpoints, checkpoint_select = get_checkpoints_project(projects_selelect, False)
1739
 
1740
+ nfe_step = gr.Number(label="NFE Step", value=32)
1741
+ ch_use_ema = gr.Checkbox(label="Use EMA", value=True)
1742
  with gr.Row():
1743
  cm_checkpoint = gr.Dropdown(
1744
+ choices=list_checkpoints, value=checkpoint_select, label="Checkpoints", allow_custom_value=True
1745
  )
1746
+ bt_checkpoint_refresh = gr.Button("Refresh")
1747
 
1748
+ random_sample_infer = gr.Button("Random Sample")
1749
 
1750
+ ref_text = gr.Textbox(label="Ref Text")
1751
+ ref_audio = gr.Audio(label="Audio Ref", type="filepath")
1752
+ gen_text = gr.Textbox(label="Gen Text")
1753
 
1754
  random_sample_infer.click(
1755
  fn=get_random_sample_infer, inputs=[cm_project], outputs=[ref_text, gen_text, ref_audio]
1756
  )
1757
 
1758
  with gr.Row():
1759
+ txt_info_gpu = gr.Textbox("", label="Device")
1760
+ check_button_infer = gr.Button("Infer")
1761
 
1762
+ gen_audio = gr.Audio(label="Audio Gen", type="filepath")
1763
 
1764
  check_button_infer.click(
1765
  fn=infer,
 
1770
  bt_checkpoint_refresh.click(fn=get_checkpoints_project, inputs=[cm_project], outputs=[cm_checkpoint])
1771
  cm_project.change(fn=get_checkpoints_project, inputs=[cm_project], outputs=[cm_checkpoint])
1772
 
1773
+ with gr.TabItem("Reduce Checkpoint"):
1774
  gr.Markdown("""```plaintext
1775
+ Reduce the model size from 5GB to 1.3GB. The new checkpoint can be used for inference or fine-tuning afterward, but it cannot be used to continue training.
1776
  ```""")
1777
+ txt_path_checkpoint = gr.Text(label="Path to Checkpoint:")
1778
+ txt_path_checkpoint_small = gr.Text(label="Path to Output:")
1779
+ ch_safetensors = gr.Checkbox(label="Safetensors", value="")
1780
+ txt_info_reduse = gr.Text(label="Info", value="")
1781
+ reduse_button = gr.Button("Reduce")
1782
  reduse_button.click(
1783
  fn=extract_and_save_ema_model,
1784
  inputs=[txt_path_checkpoint, txt_path_checkpoint_small, ch_safetensors],
1785
  outputs=[txt_info_reduse],
1786
  )
1787
 
1788
+ with gr.TabItem("System Info"):
1789
  output_box = gr.Textbox(label="GPU and CPU Information", lines=20)
1790
 
1791
  def update_stats():