jacktol commited on
Commit
73ee96d
·
1 Parent(s): 07ced72

edited readme

Browse files
Files changed (1) hide show
  1. README.md +54 -2
README.md CHANGED
@@ -1,9 +1,61 @@
1
  ---
2
  title: ATC Transcription Assistant
3
  emoji: ✈️
4
- colorFrom: yellow
5
- colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: ATC Transcription Assistant
3
  emoji: ✈️
4
+ colorFrom: purple
5
+ colorTo: red
6
  sdk: docker
7
  pinned: false
8
  ---
9
 
10
+ # ATC Transcription Assistant
11
+
12
+ ## Overview
13
+
14
+ Welcome to the **ATC Transcription Assistant**, a tool designed to transcribe **Air Traffic Control (ATC)** audio. This app utilizes OpenAI’s **Whisper medium.en** model, fine-tuned specifically for ATC communications. The fine-tuned model significantly improves transcription accuracy for aviation communications, making it a useful tool for researchers, enthusiasts, and professionals interested in analyzing ATC communications.
15
+
16
+ This project is a part of a broader research initiative aimed at enhancing Automatic Speech Recognition (ASR) accuracy in high-stakes aviation environments.
17
+
18
+ ## Features
19
+
20
+ - **Transcription Model**: The app uses a fine-tuned version of the **Whisper medium.en** model.
21
+ - **Audio Formats**: Supports **MP3** and **WAV** files containing ATC audio.
22
+ - **Transcription Output**: Converts uploaded audio into text and displays it in an easily readable format.
23
+ - **Enhanced Accuracy**: The fine-tuned model offers a **Word Error Rate (WER)** of **15.08%**, a significant improvement over the **94.59% WER** of the non-fine-tuned model.
24
+
25
+ ## Performance
26
+
27
+ - **Fine-tuned Whisper medium.en WER**: 15.08%
28
+ - **Non fine-tuned Whisper medium.en WER**: 94.59%
29
+ - **Relative Improvement**: 84.06%
30
+
31
+ > While the fine-tuned model provides substantial improvements, please note that transcription accuracy is not guaranteed.
32
+
33
+ For more details on the fine-tuning process and model performance, see the [blog post](https://jacktol.net/posts/fine-tuning_whisper_on_atc_data), or check out the [project repository](https://github.com/jack-tol/fine-tuning-whisper-on-atc-data).
34
+
35
+ ## How It Works
36
+
37
+ 1. **Upload ATC Audio**: Upload an audio file containing ATC communications in **MP3** or **WAV** format.
38
+ 2. **View Transcription**: The app will transcribe the audio and display the text on the screen.
39
+ 3. **Transcribe More Audio**: To transcribe another file, click **New Chat** in the top-right corner of the app.
40
+
41
+ ## Fine-Tuning Process
42
+
43
+ The Whisper model was fine-tuned on a custom ATC dataset created from publicly available resources, such as:
44
+
45
+ - The **ATCO2 test subset** (871 audio-transcription pairs).
46
+ - The **UWB-ATCC corpus** (11.3k rows in the training set and 2.82k rows in the test set).
47
+
48
+ After data preprocessing, dynamic data augmentation was applied to simulate challenging conditions during fine-tuning. The fine-tuned model was trained for 10 epochs on two A100 GPUs, achieving an average **WER of 15.08%**.
49
+
50
+ ## Limitations
51
+
52
+ - **Word Error Rate (WER)**: While WER is a standard evaluation metric, it does not account for subtleties like meaning or word proximity, which can make the evaluation more rigid.
53
+ - **Transcription Accuracy**: In real-world applications, minor errors may occur, but these often don't significantly impact communication.
54
+
55
+ ## Get in Touch
56
+
57
+ If you have any questions or suggestions, feel free to contact me at [[email protected]](mailto:[email protected]).
58
+
59
+ ## License
60
+
61
+ This project is licensed under the [MIT License](LICENSE).