edited readme
Browse files
README.md
CHANGED
@@ -1,9 +1,61 @@
|
|
1 |
---
|
2 |
title: ATC Transcription Assistant
|
3 |
emoji: ✈️
|
4 |
-
colorFrom:
|
5 |
-
colorTo:
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
---
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
title: ATC Transcription Assistant
|
3 |
emoji: ✈️
|
4 |
+
colorFrom: purple
|
5 |
+
colorTo: red
|
6 |
sdk: docker
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# ATC Transcription Assistant
|
11 |
+
|
12 |
+
## Overview
|
13 |
+
|
14 |
+
Welcome to the **ATC Transcription Assistant**, a tool designed to transcribe **Air Traffic Control (ATC)** audio. This app utilizes OpenAI’s **Whisper medium.en** model, fine-tuned specifically for ATC communications. The fine-tuned model significantly improves transcription accuracy for aviation communications, making it a useful tool for researchers, enthusiasts, and professionals interested in analyzing ATC communications.
|
15 |
+
|
16 |
+
This project is a part of a broader research initiative aimed at enhancing Automatic Speech Recognition (ASR) accuracy in high-stakes aviation environments.
|
17 |
+
|
18 |
+
## Features
|
19 |
+
|
20 |
+
- **Transcription Model**: The app uses a fine-tuned version of the **Whisper medium.en** model.
|
21 |
+
- **Audio Formats**: Supports **MP3** and **WAV** files containing ATC audio.
|
22 |
+
- **Transcription Output**: Converts uploaded audio into text and displays it in an easily readable format.
|
23 |
+
- **Enhanced Accuracy**: The fine-tuned model offers a **Word Error Rate (WER)** of **15.08%**, a significant improvement over the **94.59% WER** of the non-fine-tuned model.
|
24 |
+
|
25 |
+
## Performance
|
26 |
+
|
27 |
+
- **Fine-tuned Whisper medium.en WER**: 15.08%
|
28 |
+
- **Non fine-tuned Whisper medium.en WER**: 94.59%
|
29 |
+
- **Relative Improvement**: 84.06%
|
30 |
+
|
31 |
+
> While the fine-tuned model provides substantial improvements, please note that transcription accuracy is not guaranteed.
|
32 |
+
|
33 |
+
For more details on the fine-tuning process and model performance, see the [blog post](https://jacktol.net/posts/fine-tuning_whisper_on_atc_data), or check out the [project repository](https://github.com/jack-tol/fine-tuning-whisper-on-atc-data).
|
34 |
+
|
35 |
+
## How It Works
|
36 |
+
|
37 |
+
1. **Upload ATC Audio**: Upload an audio file containing ATC communications in **MP3** or **WAV** format.
|
38 |
+
2. **View Transcription**: The app will transcribe the audio and display the text on the screen.
|
39 |
+
3. **Transcribe More Audio**: To transcribe another file, click **New Chat** in the top-right corner of the app.
|
40 |
+
|
41 |
+
## Fine-Tuning Process
|
42 |
+
|
43 |
+
The Whisper model was fine-tuned on a custom ATC dataset created from publicly available resources, such as:
|
44 |
+
|
45 |
+
- The **ATCO2 test subset** (871 audio-transcription pairs).
|
46 |
+
- The **UWB-ATCC corpus** (11.3k rows in the training set and 2.82k rows in the test set).
|
47 |
+
|
48 |
+
After data preprocessing, dynamic data augmentation was applied to simulate challenging conditions during fine-tuning. The fine-tuned model was trained for 10 epochs on two A100 GPUs, achieving an average **WER of 15.08%**.
|
49 |
+
|
50 |
+
## Limitations
|
51 |
+
|
52 |
+
- **Word Error Rate (WER)**: While WER is a standard evaluation metric, it does not account for subtleties like meaning or word proximity, which can make the evaluation more rigid.
|
53 |
+
- **Transcription Accuracy**: In real-world applications, minor errors may occur, but these often don't significantly impact communication.
|
54 |
+
|
55 |
+
## Get in Touch
|
56 |
+
|
57 |
+
If you have any questions or suggestions, feel free to contact me at [[email protected]](mailto:[email protected]).
|
58 |
+
|
59 |
+
## License
|
60 |
+
|
61 |
+
This project is licensed under the [MIT License](LICENSE).
|