You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Configuration

This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.

Before starting, please ensure the requirements are met:

  1. Install pyannote.audio 3.1 with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.1 user conditions
  4. Create access token at hf.co/settings/tokens.
  5. Download pytorch_model.bin and config.yaml files into your local directory.

Usage

Load trained segmentation model

import torch
from pyannote.audio import Model

# Load the original architecture, will need to replace with your own auth token
model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)

# Path to the downloaded pytorch model
model_path = "models/pyannote_sd_normal"

# Load fine-tuned weights from the pytorch_model.bin file
model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))

Load fine-tuned speaker diarization pipeline

from pyannote.audio import Pipeline
from pyannote.metrics.diarization import DiarizationErrorRate
from pyannote.audio.pipelines import SpeakerDiarization

# Initialize the pyannote pipeline, will need to replace with your own auth token
pretrained_pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=True)

finetuned_pipeline = SpeakerDiarization(
    segmentation=model,
    embedding=pretrained_pipeline.embedding,
    embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
    clustering=pretrained_pipeline.klustering,
)

# Load fine-tuned params into the pipeline
finetuned_pipeline.load_params(model_path + "/config.yaml")

GPU usage

if torch.cuda.is_available():
    gpu = torch.device("cuda")
    finetuned_pipeline.to(gpu)
    print("gpu: ", torch.cuda.get_device_name(gpu))

Visualise diarization output

diarization = finetuned_pipeline("path/to/audio.wav")
diarization

View speaker turns, speaker ID, and time

for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
    print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")

Citations

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .