Bump autrainer Version to 0.4.0

#3
by ramppdev - opened
README.md CHANGED
@@ -1,11 +1,44 @@
1
  ---
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  # ABGS Ecoacoustic Tagging Model
 
6
  Model that tags audio files as belonging to one or more of the following labels: anthropophony (A), biophony (B), geophony (G), or silence (S).
7
 
8
  ## Installation
 
9
  To use the model, you have to install autrainer, e.g. via pip:
10
 
11
  ```bash
@@ -13,7 +46,8 @@ pip install autrainer
13
  ```
14
 
15
  ## Usage
16
- The model can be applied on all wav files present in a folder (<data-root>) and stored in another folder (<output-root>):
 
17
 
18
  ```python
19
  autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-root>
@@ -22,19 +56,25 @@ autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-roo
22
  ## Training
23
 
24
  ### Pretraining
 
25
  The model has been originally trained on AudioSet by Kong et. al..
26
 
27
  ### Dataset
 
28
  The model has been further trained (finetuned) on the training set of the EDANSA2019 dataset. The dataset was collected in the North Slope of Alaskan at latitudes between 64◦ and 70◦ N, and longitudes between 139◦ to 150◦ W from a total of 40 devices, each placed in a different location, separated by ca. 20kM from other locations. A subset of the entire dataset has been annotated for 28 labels (tags), of which only the 4 highest level categories were used: anthropophony, biophony, geophony, and silence. The sampling rate was 48kHz.
29
 
30
  ### Features
 
31
  The EDANSA2019 dataset was resampled to 32kHz, as this was the sampling rate of AudioSet, where the model was originally trained on. Log mel spectrograms were then extracted using torchlibrosa using the parameters that the upstream model was trained on.
32
 
33
  ### Training process
 
34
  The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on the official validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside `conf/config.yaml` inside the model folder.
35
 
36
  ### Evaluation
 
37
  The model has only been evaluated on in-domain data. The performance on the official test set reached a 0.9 (weighted) f1-score.
38
 
39
  ## Acknowledgments
40
- Please acknowledge the work which produced the original model and the EDANSA2019 dataset. We would also appreciate an acknowledgment to autrainer.
 
 
1
  ---
2
  license: cc-by-4.0
3
+ metrics:
4
+ - accuracy
5
+ - f1-micro
6
+ - f1-macro
7
+ - f1-weighted
8
+ pipeline_tag: audio-classification
9
+ tags:
10
+ - audio
11
+ - audio-classification
12
+ - ecoacoustic-tagging
13
+ - autrainer
14
+ library_name: autrainer
15
+ model-index:
16
+ - name: edansa-2019-cnn10-32k-t
17
+ results:
18
+ - task:
19
+ type: audio-classification
20
+ name: Ecoacoustic Tagging
21
+ metrics:
22
+ - type: accuracy
23
+ name: Accuracy
24
+ value: 0.902352418996893
25
+ - type: f1-micro
26
+ name: Micro F1
27
+ value: 0.902352418996893
28
+ - type: f1-macro
29
+ name: Macro F1
30
+ value: 0.8717470319118755
31
+ - type: f1-weighted
32
+ name: Weighted F1
33
+ value: 0.8999580955196549
34
  ---
35
 
36
  # ABGS Ecoacoustic Tagging Model
37
+
38
  Model that tags audio files as belonging to one or more of the following labels: anthropophony (A), biophony (B), geophony (G), or silence (S).
39
 
40
  ## Installation
41
+
42
  To use the model, you have to install autrainer, e.g. via pip:
43
 
44
  ```bash
 
46
  ```
47
 
48
  ## Usage
49
+
50
+ The model can be applied on all wav files present in a folder (`<data-root>`) and stored in another folder (`<output-root>`):
51
 
52
  ```python
53
  autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-root>
 
56
  ## Training
57
 
58
  ### Pretraining
59
+
60
  The model has been originally trained on AudioSet by Kong et. al..
61
 
62
  ### Dataset
63
+
64
  The model has been further trained (finetuned) on the training set of the EDANSA2019 dataset. The dataset was collected in the North Slope of Alaskan at latitudes between 64◦ and 70◦ N, and longitudes between 139◦ to 150◦ W from a total of 40 devices, each placed in a different location, separated by ca. 20kM from other locations. A subset of the entire dataset has been annotated for 28 labels (tags), of which only the 4 highest level categories were used: anthropophony, biophony, geophony, and silence. The sampling rate was 48kHz.
65
 
66
  ### Features
67
+
68
  The EDANSA2019 dataset was resampled to 32kHz, as this was the sampling rate of AudioSet, where the model was originally trained on. Log mel spectrograms were then extracted using torchlibrosa using the parameters that the upstream model was trained on.
69
 
70
  ### Training process
71
+
72
  The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on the official validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside `conf/config.yaml` inside the model folder.
73
 
74
  ### Evaluation
75
+
76
  The model has only been evaluated on in-domain data. The performance on the official test set reached a 0.9 (weighted) f1-score.
77
 
78
  ## Acknowledgments
79
+
80
+ Please acknowledge the work which produced the original model and the EDANSA2019 dataset. We would also appreciate an acknowledgment to autrainer.
file_handler.yaml CHANGED
@@ -1 +1 @@
1
- $autrainer.datasets.utils.file_handlers.NumpyFileHandler==0.3.0: {}
 
1
+ $autrainer.datasets.utils.file_handlers.NumpyFileHandler==0.4.0: {}
inference_transform.yaml CHANGED
@@ -1,2 +1,2 @@
1
- $autrainer.transforms.smart_compose.SmartCompose==0.3.0:
2
  transforms: []
 
1
+ $autrainer.transforms.smart_compose.SmartCompose==0.4.0:
2
  transforms: []
model.yaml CHANGED
@@ -1,7 +1,5 @@
1
- $autrainer.models.cnn_10.Cnn10==0.3.0:
2
  output_dim: 4
3
- sigmoid_output: false
4
- sigmoid_predictions: true
5
  segmentwise: false
6
  in_channels: 1
7
  transfer: https://zenodo.org/records/3987831/files/Cnn10_mAP%3D0.380.pth
 
1
+ $autrainer.models.cnn_10.Cnn10==0.4.0:
2
  output_dim: 4
 
 
3
  segmentwise: false
4
  in_channels: 1
5
  transfer: https://zenodo.org/records/3987831/files/Cnn10_mAP%3D0.380.pth
preprocess_file_handler.yaml CHANGED
@@ -1,2 +1,2 @@
1
- $autrainer.datasets.utils.file_handlers.AudioFileHandler==0.3.0:
2
  target_sample_rate: 32000
 
1
+ $autrainer.datasets.utils.file_handlers.AudioFileHandler==0.4.0:
2
  target_sample_rate: 32000
preprocess_pipeline.yaml CHANGED
@@ -1,15 +1,15 @@
1
- $autrainer.transforms.smart_compose.SmartCompose==0.3.0:
2
  transforms:
3
- - $autrainer.transforms.specific_transforms.StereoToMono==0.3.0:
4
- order: -95
5
- - $autrainer.transforms.specific_transforms.PannMel==0.3.0:
6
- window_size: 1024
7
- hop_size: 320
8
- sample_rate: 32000
9
- fmin: 50
10
- fmax: 14000
11
- mel_bins: 64
12
- ref: 1.0
13
- amin: 1.0e-10
14
- top_db: null
15
- order: -90
 
1
+ $autrainer.transforms.smart_compose.SmartCompose==0.4.0:
2
  transforms:
3
+ - $autrainer.transforms.specific_transforms.StereoToMono==0.4.0:
4
+ order: -95
5
+ - $autrainer.transforms.specific_transforms.PannMel==0.4.0:
6
+ window_size: 1024
7
+ hop_size: 320
8
+ sample_rate: 32000
9
+ fmin: 50
10
+ fmax: 14000
11
+ mel_bins: 64
12
+ ref: 1.0
13
+ amin: 1.0e-10
14
+ top_db: null
15
+ order: -90
target_transform.yaml CHANGED
@@ -1,7 +1,7 @@
1
- $autrainer.datasets.utils.target_transforms.MultiLabelEncoder==0.3.0:
2
  threshold: 0.5
3
  labels:
4
- - Anth
5
- - Bio
6
- - Geo
7
- - Sil
 
1
+ $autrainer.datasets.utils.target_transforms.MultiLabelEncoder==0.4.0:
2
  threshold: 0.5
3
  labels:
4
+ - Anth
5
+ - Bio
6
+ - Geo
7
+ - Sil