How to diarize an audio file

Prerequisites

Before you start, you’ll need:

pyannoteAI account with credit or active subscription
An API key from your dashboard
A publicly accessible audio file URL

For help creating an account and getting your API key, see the quickstart guide.

1. Diarize API request

Send a POST request to the diarize endpoint with your audio file URL. In our example we use a sample audio file hosted on pyannoteAI servers. Its a 79 second recording with two speakers. You may use this url to test the API: https://files.pyannote.ai/marklex1min.wav

The URL must be a direct link to a publicly accessible audio file. Make sure the URL points directly to the file (e.g., ends with .wav, .mp3, etc.) and is accessible without authentication.Typically, you’ll use a signed URL from cloud storage such as AWS S3 buckets or other cloud storage services. We also offer our own upload file solution. For details on uploading audio files to our servers, see:

How to upload an audio file

diarize.py

import requests

url = "https://api.pyannote.ai/v1/diarize"
api_key = "YOUR_API_KEY"  # In production, use environment variables: os.getenv("PYANNOTE_API_KEY")

headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
data = {"url": "https://files.pyannote.ai/marklex1min.wav"}

response = requests.post(url, headers=headers, json=data)

if response.status_code != 200:
    print(f"Error: {response.status_code} - {response.text}")
else:
    print(response.json())

The response will include a jobId that you can use to track the diarization job progress:

Example response

{
  "jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "created"
}

2. Get diarization result

Once you have a jobId, you can retrieve the results using either polling or using webhooks:

Job results are automatically deleted after 24 hours, for all endpoints. Make sure to save your results in your own database.

Polling

Poll the get job endpoint to check job status and retrieve results when complete.

Be cautious of rate limits when polling. Excessive requests can lead to rate limiting. In production, we strongly recommend using webhooks instead.

polling.py

import time

api_key = "YOUR_API_KEY"  # In production, use environment variables: os.getenv("PYANNOTE_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}

while True:
    response = requests.get(
        f"https://api.pyannote.ai/v1/jobs/{job_id}", headers=headers
    )

    if response.status_code != 200:
        print(f"Error: {response.status_code} - {response.text}")
        break

    data = response.json()
    status = data["status"]

    if status in ["succeeded", "failed", "canceled"]:
        if status == "succeeded":
            print("Job completed successfully!")
            print(data["output"])
        else:
            print(f"Job {status}")
        break

    print(f"Job status: {status}, waiting...")
    time.sleep(10)  # Wait 10 seconds before polling again

Webhook

Specify a webhook URL when creating the diarization job to receive updates automatically when the job reaches a terminal status.

Webhooks are sent for terminal statuses only: succeeded, failed, and canceled. They are not sent for pending, created, or running.For failed and canceled jobs, payloads include jobId and status (without output).

1. Specify your webhook URL

Add the webhook parameter to your diarization request payload. If you only need status updates (useful for smaller payloads), set webhookStatusOnly to true (default is false):

diarize_with_webhook.py

data = {
    "url": "https://files.pyannote.ai/marklex1min.wav",
    "webhook": "https://your-server.com/webhook"
}

2. Create server exposing webhook endpoint

Here we show a simple example of how to expose a server that accepts the webhook POST requests. You can use any web framework of your choice.

webhook.py

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/webhook', methods=['POST'])
def handle_webhook():
    data = request.json
    status = data.get('status')

    if status == 'succeeded':
        print("Diarization completed!")
        print("Job ID:", data['jobId'])
        if 'output' in data:
            print("Results:", data['output'])

    if status == 'failed':
        print("Job failed.")

    if status == 'canceled':
        print("Job canceled.")

    return jsonify({'status': 'received'}), 200

if __name__ == '__main__':
    app.run(port=5000)

You can also use a tool like ngrok to expose your local server to the internet for testing webhooks, or use webhook.site for quick testing.

Learn more about webhooks:

Receiving webhooks - Learn about webhook payloads, retries, and failure codes
Verifying webhooks - Learn how to verify webhook signatures to ensure requests are from pyannoteAI

Getting Started

Tutorials

Support

Webhooks

How to diarize an audio file

Prerequisites

1. Diarize API request

2. Get diarization result

Polling

Webhook

1. Specify your webhook URL

2. Create server exposing webhook endpoint

Getting Started

Tutorials

Support

Webhooks

​Prerequisites

​1. Diarize API request

​2. Get diarization result

​ Polling

​ Webhook

​1. Specify your webhook URL

​2. Create server exposing webhook endpoint

Prerequisites

1. Diarize API request

2. Get diarization result

Polling

Webhook

1. Specify your webhook URL

2. Create server exposing webhook endpoint