Bump autrainer Version to 0.4.0

by ramppdev - opened Oct 25, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+65

-27

Files changed (7) hide show

README.md +42 -2
file_handler.yaml +1 -1
inference_transform.yaml +1 -1
model.yaml +1 -3
preprocess_file_handler.yaml +1 -1
preprocess_pipeline.yaml +14 -14
target_transform.yaml +5 -5

README.md CHANGED Viewed

@@ -1,11 +1,44 @@
 ---
 license: cc-by-4.0
 ---
 # ABGS Ecoacoustic Tagging Model
 Model that tags audio files as belonging to one or more of the following labels: anthropophony (A), biophony (B), geophony (G), or silence (S).
 ## Installation
 To use the model, you have to install autrainer, e.g. via pip:
 ```bash
@@ -13,7 +46,8 @@ pip install autrainer
 ```
 ## Usage
-The model can be applied on all wav files present in a folder (<data-root>) and stored in another folder (<output-root>):
 ```python
 autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-root>
@@ -22,19 +56,25 @@ autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-roo
 ## Training
 ### Pretraining
 The model has been originally trained on AudioSet by Kong et. al..
 ### Dataset
 The model has been further trained (finetuned) on the training set of the EDANSA2019 dataset. The dataset was collected in the North Slope of Alaskan at latitudes between 64◦ and 70◦ N, and longitudes between 139◦ to 150◦ W from a total of 40 devices, each placed in a different location, separated by ca. 20kM from other locations. A subset of the entire dataset has been annotated for 28 labels (tags), of which only the 4 highest level categories were used: anthropophony, biophony, geophony, and silence. The sampling rate was 48kHz.
 ### Features
 The EDANSA2019 dataset was resampled to 32kHz, as this was the sampling rate of AudioSet, where the model was originally trained on. Log mel spectrograms were then extracted using torchlibrosa using the parameters that the upstream model was trained on.
 ### Training process
 The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on the official validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside `conf/config.yaml` inside the model folder.
 ### Evaluation
 The model has only been evaluated on in-domain data. The performance on the official test set reached a 0.9 (weighted) f1-score.
 ## Acknowledgments
-Please acknowledge the work which produced the original model and the EDANSA2019 dataset. We would also appreciate an acknowledgment to autrainer.

 ---
 license: cc-by-4.0
+metrics:
+  - accuracy
+  - f1-micro
+  - f1-macro
+  - f1-weighted
+pipeline_tag: audio-classification
+tags:
+  - audio
+  - audio-classification
+  - ecoacoustic-tagging
+  - autrainer
+library_name: autrainer
+model-index:
+  - name: edansa-2019-cnn10-32k-t
+    results:
+      - task:
+          type: audio-classification
+          name: Ecoacoustic Tagging
+        metrics:
+          - type: accuracy
+            name: Accuracy
+            value: 0.902352418996893
+          - type: f1-micro
+            name: Micro F1
+            value: 0.902352418996893
+          - type: f1-macro
+            name: Macro F1
+            value: 0.8717470319118755
+          - type: f1-weighted
+            name: Weighted F1
+            value: 0.8999580955196549
 ---
 # ABGS Ecoacoustic Tagging Model
 Model that tags audio files as belonging to one or more of the following labels: anthropophony (A), biophony (B), geophony (G), or silence (S).
 ## Installation
 To use the model, you have to install autrainer, e.g. via pip:
 ```bash
 ```
 ## Usage
+The model can be applied on all wav files present in a folder (`<data-root>`) and stored in another folder (`<output-root>`):
 ```python
 autrainer inference hf:autrainer/edansa-2019-cnn10-32k-t <data-root> <output-root>
 ## Training
 ### Pretraining
 The model has been originally trained on AudioSet by Kong et. al..
 ### Dataset
 The model has been further trained (finetuned) on the training set of the EDANSA2019 dataset. The dataset was collected in the North Slope of Alaskan at latitudes between 64◦ and 70◦ N, and longitudes between 139◦ to 150◦ W from a total of 40 devices, each placed in a different location, separated by ca. 20kM from other locations. A subset of the entire dataset has been annotated for 28 labels (tags), of which only the 4 highest level categories were used: anthropophony, biophony, geophony, and silence. The sampling rate was 48kHz.
 ### Features
 The EDANSA2019 dataset was resampled to 32kHz, as this was the sampling rate of AudioSet, where the model was originally trained on. Log mel spectrograms were then extracted using torchlibrosa using the parameters that the upstream model was trained on.
 ### Training process
 The model has been trained for 30 epochs. At the end of each epoch, the model was evaluated on the official validation set. We release the state that achieved the best performance on this validation set. All training hyperparameters can be found inside `conf/config.yaml` inside the model folder.
 ### Evaluation
 The model has only been evaluated on in-domain data. The performance on the official test set reached a 0.9 (weighted) f1-score.
 ## Acknowledgments
+Please acknowledge the work which produced the original model and the EDANSA2019 dataset. We would also appreciate an acknowledgment to autrainer.

file_handler.yaml CHANGED Viewed

	@@ -1 +1 @@
1	- $autrainer.datasets.utils.file_handlers.NumpyFileHandler==0.3.0: {}


1	+ $autrainer.datasets.utils.file_handlers.NumpyFileHandler==0.4.0: {}

inference_transform.yaml CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- $autrainer.transforms.smart_compose.SmartCompose==0.3.0:
2	transforms: []


1	+ $autrainer.transforms.smart_compose.SmartCompose==0.4.0:
2	transforms: []

model.yaml CHANGED Viewed

@@ -1,7 +1,5 @@
-$autrainer.models.cnn_10.Cnn10==0.3.0:
   output_dim: 4
-  sigmoid_output: false
-  sigmoid_predictions: true
   segmentwise: false
   in_channels: 1
   transfer: https://zenodo.org/records/3987831/files/Cnn10_mAP%3D0.380.pth

+$autrainer.models.cnn_10.Cnn10==0.4.0:
   output_dim: 4
   segmentwise: false
   in_channels: 1
   transfer: https://zenodo.org/records/3987831/files/Cnn10_mAP%3D0.380.pth

preprocess_file_handler.yaml CHANGED Viewed

	@@ -1,2 +1,2 @@
1	- $autrainer.datasets.utils.file_handlers.AudioFileHandler==0.3.0:
2	target_sample_rate: 32000


1	+ $autrainer.datasets.utils.file_handlers.AudioFileHandler==0.4.0:
2	target_sample_rate: 32000

preprocess_pipeline.yaml CHANGED Viewed

@@ -1,15 +1,15 @@
-$autrainer.transforms.smart_compose.SmartCompose==0.3.0:
   transforms:
-  - $autrainer.transforms.specific_transforms.StereoToMono==0.3.0:
-      order: -95
-  - $autrainer.transforms.specific_transforms.PannMel==0.3.0:
-      window_size: 1024
-      hop_size: 320
-      sample_rate: 32000
-      fmin: 50
-      fmax: 14000
-      mel_bins: 64
-      ref: 1.0
-      amin: 1.0e-10
-      top_db: null
-      order: -90

+$autrainer.transforms.smart_compose.SmartCompose==0.4.0:
   transforms:
+    - $autrainer.transforms.specific_transforms.StereoToMono==0.4.0:
+        order: -95
+    - $autrainer.transforms.specific_transforms.PannMel==0.4.0:
+        window_size: 1024
+        hop_size: 320
+        sample_rate: 32000
+        fmin: 50
+        fmax: 14000
+        mel_bins: 64
+        ref: 1.0
+        amin: 1.0e-10
+        top_db: null
+        order: -90

target_transform.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
-$autrainer.datasets.utils.target_transforms.MultiLabelEncoder==0.3.0:
   threshold: 0.5
   labels:
-  - Anth
-  - Bio
-  - Geo
-  - Sil

+$autrainer.datasets.utils.target_transforms.MultiLabelEncoder==0.4.0:
   threshold: 0.5
   labels:
+    - Anth
+    - Bio
+    - Geo
+    - Sil