ameerazam08
/

MultiTalk-Code

Model card Files Files and versions Community

ameerazam08 commited on Jul 6, 2024

Commit

6931c7b

verified ·

1 Parent(s): aa7a0c6

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +6 -0
.gitignore +2 -0
MultiTalk_dataset/README.md +96 -0
MultiTalk_dataset/annotations/arabic.json +0 -0
MultiTalk_dataset/annotations/catalan.json +0 -0
MultiTalk_dataset/annotations/croatian.json +0 -0
MultiTalk_dataset/annotations/czech.json +0 -0
MultiTalk_dataset/annotations/dutch.json +0 -0
MultiTalk_dataset/annotations/english.json +0 -0
MultiTalk_dataset/annotations/french.json +0 -0
MultiTalk_dataset/annotations/german.json +0 -0
MultiTalk_dataset/annotations/greek.json +0 -0
MultiTalk_dataset/annotations/hindi.json +0 -0
MultiTalk_dataset/annotations/italian.json +0 -0
MultiTalk_dataset/annotations/japanese.json +0 -0
MultiTalk_dataset/annotations/mandarin.json +0 -0
MultiTalk_dataset/annotations/polish.json +0 -0
MultiTalk_dataset/annotations/portuguese.json +0 -0
MultiTalk_dataset/annotations/russian.json +0 -0
MultiTalk_dataset/annotations/spanish.json +0 -0
MultiTalk_dataset/annotations/thai.json +0 -0
MultiTalk_dataset/annotations/turkish.json +0 -0
MultiTalk_dataset/annotations/ukrainian.json +0 -0
MultiTalk_dataset/dataset.sh +4 -0
MultiTalk_dataset/download_and_process.py +147 -0
README.md +140 -0
RUN/multi/MultiTalk_s2/test-20240707_000247.log +15 -0
RUN/multi/MultiTalk_s2/test-20240707_000302.log +110 -0
RUN/multi/MultiTalk_s2/test-20240707_000539.log +107 -0
RUN/multi/MultiTalk_s2/test-20240707_000731.log +93 -0
RUN/vocaset/MultiTalk_s2/test-20240707_000820.log +90 -0
assets/statistic.png +0 -0
assets/teaser.png +3 -0
base/__init__.py +1 -0
base/__pycache__/__init__.cpython-38.pyc +0 -0
base/__pycache__/baseTrainer.cpython-38.pyc +0 -0
base/__pycache__/base_model.cpython-38.pyc +0 -0
base/__pycache__/config.cpython-38.pyc +0 -0
base/__pycache__/utilities.cpython-38.pyc +0 -0
base/baseTrainer.py +66 -0
base/base_model.py +30 -0
base/config.py +165 -0
base/utilities.py +66 -0
checkpoints/FLAME_sample.ply +0 -0
checkpoints/stage1.pth.tar +3 -0
checkpoints/stage2.pth.tar +3 -0
checkpoints/templates.pkl +3 -0
config/multi/demo.yaml +47 -0
config/multi/stage1.yaml +79 -0
config/multi/stage2.yaml +97 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/teaser.png filter=lfs diff=lfs merge=lfs -text
+demo/input/English_WTT5UTZQ9K8_8.wav filter=lfs diff=lfs merge=lfs -text
+demo/input/French_JATq1mUhfiA_8.wav filter=lfs diff=lfs merge=lfs -text
+demo/input/Italian_72pdx3tZwto_4.wav filter=lfs diff=lfs merge=lfs -text
+sample_dataset/wav/Greek_0_38_FbWPEz8NFS8.wav filter=lfs diff=lfs merge=lfs -text
+sample_dataset/wav/Spanish_xyVZDmzt6HY_6.wav filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ /MultiTalk_dataset/raw_video
2	+ /MultiTalk_dataset/multitalk_dataset

MultiTalk_dataset/README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+## Overview
+MultiTalk dataset is a new multilingual 2D video dataset featuring over 420 hours of talking videos across 20 languages.
+It contains 293,812 clips with a resolution of 512x512, a frame rate of 25 fps, and an average duration of 5.19 seconds per clip.
+The dataset shows a balanced distribution across languages, with each language representing between 2.0% and 9.7% of the total.
+<img alt="statistic" src="../assets/statistic.png" width=560>
+<details><summary><b>Detailed statistics</b></summary><p>
+| Language | Total Duration(h) | #Clips | Avg. Duration(s) |                                                     Annotation                                                      |
+|:---:|:---:|:---:|:---:|:-------------------------------------------------------------------------------------------------------------------:|
+| Arabic | 10.32 | 9048 | 4.11 |     [arabic.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/arabic.json)     |
+| Catalan | 41.0 |  29232 | 5.05 |    [catalan.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/catalan.json)    |
+| Croatian | 41.0 |  25465 | 5.80 |   [croatian.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/croatian.json)   |
+| Czech | 18.9 | 11228 | 6.06 |      [czech.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/czech.json)      |
+| Dutch | 17.05 | 14187 | 4.33 |      [dutch.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/dutch.json)      |
+| English | 15.49 |  11082 | 5.03 |    [english.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/english.json)    |
+| French | 13.17 |  11576 | 4.10 |     [french.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/french.json)     |
+| German | 16.25 | 10856 | 5.39 |     [german.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/german.json)     |
+| Greek | 17.53 | 12698 | 4.97 |      [greek.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/greek.json)      |
+| Hindi | 24.41 | 16120 | 5.45 |      [hindi.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/hindi.json)      |
+| Italian | 13.59 | 9753 | 5.02 |    [italian.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/italian.json)    |
+| Japanese | 8.36 | 5990 | 5.03 |   [japanese.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/japanese.json)   |
+| Mandarin | 8.73 | 6096 | 5.15 |   [mandarin.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/mandarin.json)   |
+| Polish | 21.58 | 15181 | 5.12 |     [polish.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/polish.json)     |
+| Portuguese | 41.0 | 25321 | 5.83 | [portuguese.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/portuguese.json) |
+| Russian | 26.32 | 17811 | 5.32 |    [russian.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/russian.json)    |
+| Spanish | 23.65 | 18758 | 4.54 |    [spanish.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/spanish.json)    |
+| Thai | 10.95 | 7595 | 5.19 |       [thai.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/thai.json)       |
+| Turkish | 12.9 | 11165 | 4.16 |    [turkish.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/turkish.json)    |
+| Ukrainian | 41.0 | 24650 | 5.99 |  [ukrainian.json](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/annotations/ukrainian.json)  |
+</p></details>
+## Download
+### Usage
+**Prepare the environment:**
+```bash
+pip install pytube
+pip install opencv-python
+```
+**Run script:**
+```bash
+cd MultiTalk_Dataset
+```
+You can pass the languages you want to download as arguments to the script. If you want to download all 20 languages, run the following script.
+```bash
+sh dataset.sh arabic catalan croatian czech dutch english french german greek hindi italian japanese mandarin polish portuguese russian spanish thai turkish ukrainian
+```
+After downloading, the folder structure will be as below. Each language folder contains the .mp4 videos.
+You can change the ${ROOT} folder in the [code](https://github.com/postech-ami/MultiTalk/tree/main/MultiTalk_dataset/download_and_process.py).
+```
+    ${ROOT}
+    ├── multitalk_dataset        # MultiTalk Dataset
+    │   ├── arabic
+    │   │   ├── O-VJXuHb390_0.mp4
+    │   │   ├── O-VJXuHb390_1.mp4
+    │   │   ├── ...
+    │   │   └── ...
+    │   ├── catalan
+    │   ├── ...
+    │   └── ...
+    └── raw_video              # Original videos (you can remove this directory after downloading)
+        ├── arabic
+        ├── catalan
+        ├── ...
+        └── ...
+```
+### JSON File Structure
+```javascript
+{
+    "QrDZjUeiUwc_0":  // clip 1
+    {
+        "youtube_id": "QrDZjUeiUwc",                                // youtube id
+        "duration": {"start_sec": 302.0, "end_sec": 305.56},        // start and end times in the original video
+        "bbox": {"top": 0.0, "bottom": 0.8167, "left": 0.4484, "right": 0.9453},  // bounding box
+        "language": "czech",                                        // language
+        "transcript": "já jsem v podstatě obnovil svůj list z minulého roku"      // transcript
+    },
+    "QrDZjUeiUwc_1":  // clip 2
+    {
+        "youtube_id": "QrDZjUeiUwc",
+        "duration": {"start_sec": 0.12, "end_sec": 4.12},
+        "bbox": {"top": 0.0097, "bottom": 0.55, "left": 0.3406, "right": 0.6398},
+        "language": "czech",
+        "transcript": "ahoj tady anička a vítejte u dalšího easycheck videa"
+    }
+    "..."
+    "..."
+}
+```

MultiTalk_dataset/annotations/arabic.json ADDED Viewed