YannQi commited on
Commit
dd86488
·
verified ·
1 Parent(s): 1cd65d4

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -3
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/)
3
+
4
+ [Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming)
5
+
6
+ This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024.
7
+
8
+ ## 🔥What's New
9
+
10
+ - (2024. 3.14) Our checkpoints are available to the public!
11
+ - (2024. 3.12) Our code is available to the public🌲!
12
+ - (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024!
13
+ - (2023.11.17) We completed the implemention of COMBO and push the code.
14
+
15
+ <!-- ## 🪵 TODO List -->
16
+
17
+
18
+ ## 🛠️ Getting Started
19
+
20
+ ### 1. Environments
21
+
22
+ - Linux or macOS with Python ≥ 3.6
23
+
24
+ ```shell
25
+ # recommended
26
+ pip install -r requirements.txt
27
+ pip install soundfile
28
+ # build MSDeformAttention
29
+ cd model/modeling/pixel_decoder/ops
30
+ sh make.sh
31
+ ```
32
+
33
+ - Preprocessing for detectron2
34
+
35
+ For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2.
36
+
37
+ The refined file that requires attention is located at:
38
+
39
+ `conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py`
40
+ (refine the `xxx` to your own environment)
41
+
42
+ Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors:
43
+
44
+ ```python
45
+ # raise ValueError("Cannot match one checkpoint key to multiple keys in the model.")
46
+ ```
47
+
48
+ - Install Semantic-SAM (Optional)
49
+
50
+ ```shell
51
+ # Semantic-SAM
52
+ pip install git+https://github.com/cocodataset/panopticapi.git
53
+ git clone https://github.com/UX-Decoder/Semantic-SAM
54
+ cd Semantic-SAM
55
+ python -m pip install -r requirements.txt
56
+ ```
57
+
58
+ Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM)
59
+
60
+ ### 2. Datasets
61
+
62
+ Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow:
63
+
64
+ ```
65
+ |--AVS_dataset
66
+ |--AVSBench_semantic/
67
+ |--AVSBench_object/Multi-sources/
68
+ |--AVSBench_object/Single-source/
69
+ ```
70
+
71
+ ### 3. Download Pre-Trained Models
72
+
73
+ - The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO].
74
+
75
+ ```
76
+ |--pretrained
77
+ |--detectron2/R-50.pkl
78
+ |--detectron2/d2_pvt_v2_b5.pkl
79
+ |--vggish-10086976.pth
80
+ |--vggish_pca_params-970ea276.pth
81
+ ```
82
+
83
+ ### 4. Maskiges pregeneration
84
+
85
+ - Generate class-agnostic masks (Optional)
86
+
87
+ ```shell
88
+ sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss
89
+ sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val
90
+ sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test
91
+ ```
92
+
93
+ - Generate Maskiges (Optional)
94
+
95
+ ```shell
96
+ python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss
97
+ python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val
98
+ python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test
99
+ ```
100
+
101
+ - Move Maskiges to the following folder
102
+ Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link.
103
+
104
+ ```
105
+ |--AVS_dataset
106
+ |--AVSBench_semantic/pre_SAM_mask/
107
+ |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
108
+ |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/
109
+ ```
110
+
111
+ ### 5. Train
112
+
113
+ ```shell
114
+ # ResNet-50
115
+ sh scripts/res_train_avs4.sh # or ms3, avss
116
+ ```
117
+
118
+ ```shell
119
+ # PVTv2
120
+ sh scripts/pvt_train_avs4.sh # or ms3, avss
121
+ ```
122
+
123
+ ### 6. Test
124
+
125
+ ```shell
126
+ # ResNet-50
127
+ sh scripts/res_test_avs4.sh # or ms3, avss
128
+ ```
129
+
130
+ ```shell
131
+ # PVTv2
132
+ sh scripts/pvt_test_avs4.sh # or ms3, avss
133
+ ```
134
+
135
+ ## 🤝 Citing COMBO
136
+
137
+ ```
138
+ @misc{yang2023cooperation,
139
+ title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation},
140
+ author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang},
141
+ year={2023},
142
+ eprint={2312.06462},
143
+ archivePrefix={arXiv},
144
+ primaryClass={cs.CV}
145
+ }
146
+ ```