Upload README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,146 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/)
|
3 |
+
|
4 |
+
[Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming)
|
5 |
+
|
6 |
+
This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024.
|
7 |
+
|
8 |
+
## 🔥What's New
|
9 |
+
|
10 |
+
- (2024. 3.14) Our checkpoints are available to the public!
|
11 |
+
- (2024. 3.12) Our code is available to the public🌲!
|
12 |
+
- (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
|
13 |
+
- (2023.11.17) We completed the implemention of COMBO and push the code.
|
14 |
+
|
15 |
+
<!-- ## 🪵 TODO List -->
|
16 |
+
|
17 |
+
|
18 |
+
## 🛠️ Getting Started
|
19 |
+
|
20 |
+
### 1. Environments
|
21 |
+
|
22 |
+
- Linux or macOS with Python ≥ 3.6
|
23 |
+
|
24 |
+
```shell
|
25 |
+
# recommended
|
26 |
+
pip install -r requirements.txt
|
27 |
+
pip install soundfile
|
28 |
+
# build MSDeformAttention
|
29 |
+
cd model/modeling/pixel_decoder/ops
|
30 |
+
sh make.sh
|
31 |
+
```
|
32 |
+
|
33 |
+
- Preprocessing for detectron2
|
34 |
+
|
35 |
+
For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.
|
36 |
+
|
37 |
+
The refined file that requires attention is located at:
|
38 |
+
|
39 |
+
`conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py`
|
40 |
+
(refine the `xxx` to your own environment)
|
41 |
+
|
42 |
+
Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors:
|
43 |
+
|
44 |
+
```python
|
45 |
+
# raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
|
46 |
+
```
|
47 |
+
|
48 |
+
- Install Semantic-SAM (Optional)
|
49 |
+
|
50 |
+
```shell
|
51 |
+
# Semantic-SAM
|
52 |
+
pip install git+https://github.com/cocodataset/panopticapi.git
|
53 |
+
git clone https://github.com/UX-Decoder/Semantic-SAM
|
54 |
+
cd Semantic-SAM
|
55 |
+
python -m pip install -r requirements.txt
|
56 |
+
```
|
57 |
+
|
58 |
+
Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM)
|
59 |
+
|
60 |
+
### 2. Datasets
|
61 |
+
|
62 |
+
Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow:
|
63 |
+
|
64 |
+
```
|
65 |
+
|--AVS_dataset
|
66 |
+
|--AVSBench_semantic/
|
67 |
+
|--AVSBench_object/Multi-sources/
|
68 |
+
|--AVSBench_object/Single-source/
|
69 |
+
```
|
70 |
+
|
71 |
+
### 3. Download Pre-Trained Models
|
72 |
+
|
73 |
+
- The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO].
|
74 |
+
|
75 |
+
```
|
76 |
+
|--pretrained
|
77 |
+
|--detectron2/R-50.pkl
|
78 |
+
|--detectron2/d2_pvt_v2_b5.pkl
|
79 |
+
|--vggish-10086976.pth
|
80 |
+
|--vggish_pca_params-970ea276.pth
|
81 |
+
```
|
82 |
+
|
83 |
+
### 4. Maskiges pregeneration
|
84 |
+
|
85 |
+
- Generate class-agnostic masks (Optional)
|
86 |
+
|
87 |
+
```shell
|
88 |
+
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
|
89 |
+
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val
|
90 |
+
sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
|
91 |
+
```
|
92 |
+
|
93 |
+
- Generate Maskiges (Optional)
|
94 |
+
|
95 |
+
```shell
|
96 |
+
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
|
97 |
+
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
|
98 |
+
python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
|
99 |
+
```
|
100 |
+
|
101 |
+
- Move Maskiges to the following folder
|
102 |
+
Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link.
|
103 |
+
|
104 |
+
```
|
105 |
+
|--AVS_dataset
|
106 |
+
|--AVSBench_semantic/pre_SAM_mask/
|
107 |
+
|--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
|
108 |
+
|--AVSBench_object/Single-source/s4_data/pre_SAM_mask/
|
109 |
+
```
|
110 |
+
|
111 |
+
### 5. Train
|
112 |
+
|
113 |
+
```shell
|
114 |
+
# ResNet-50
|
115 |
+
sh scripts/res_train_avs4.sh # or ms3, avss
|
116 |
+
```
|
117 |
+
|
118 |
+
```shell
|
119 |
+
# PVTv2
|
120 |
+
sh scripts/pvt_train_avs4.sh # or ms3, avss
|
121 |
+
```
|
122 |
+
|
123 |
+
### 6. Test
|
124 |
+
|
125 |
+
```shell
|
126 |
+
# ResNet-50
|
127 |
+
sh scripts/res_test_avs4.sh # or ms3, avss
|
128 |
+
```
|
129 |
+
|
130 |
+
```shell
|
131 |
+
# PVTv2
|
132 |
+
sh scripts/pvt_test_avs4.sh # or ms3, avss
|
133 |
+
```
|
134 |
+
|
135 |
+
## 🤝 Citing COMBO
|
136 |
+
|
137 |
+
```
|
138 |
+
@misc{yang2023cooperation,
|
139 |
+
title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
|
140 |
+
author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
|
141 |
+
year={2023},
|
142 |
+
eprint={2312.06462},
|
143 |
+
archivePrefix={arXiv},
|
144 |
+
primaryClass={cs.CV}
|
145 |
+
}
|
146 |
+
```
|