Audio

Audio APIs use JSON for generation. Only upload uses multipart/form-data.

Endpoints

Method Path Body
POST /v2/audio multipartaudio_file
POST /v2/audio/estimate-credits JSON
POST /v2/audio/text-2-speech JSON — prompt, voice, option
POST /v2/audio/text-2-sound-effect JSON — prompt, duration
POST /v2/audio/text-2-music JSON — prompt

Workflow

  1. Optionally POST /v2/audio to host a file, or use any public HTTPS URL elsewhere.
  2. Call a generation route with application/json.
  3. Read results on the returned asset (audio URL strings).

Responses

  • All routes (including upload) → asset object without a data wrapper
  • Output URLs are in results; poll by asset id when async_mode is true

Generation routes run synchronously by default (async_mode is not used on audio).

Authentication

X-Api-Key: YOUR_API_KEY on every request.

Legacy form-data routes (with data wrapper on some responses) remain at /image/*, /video/*, /audio/*, and /text/* (tagged (Legacy) in the schema).

Upload Audio

Uploads an audio file. multipart/form-data — field audio_file.

import requests

with open("voiceover.mp3", "rb") as f:
    r = requests.post(
        "https://api.vimmerse.net/v2/audio",
        headers={"X-Api-Key": "YOUR_API_KEY"},
        files={"audio_file": ("voiceover.mp3", f, "audio/mpeg")},
        timeout=120,
    )
print(r.json()["results"])
SecurityAPIKeyHeader
Request
query Parameters
audio_url
string (Audio Url)
Default: ""
option
any (Option)
Default: "UploadAudio"
audio_file_path
any (Audio File Path)
removing_asset_args
any (Removing Asset Args)
audio_duration
any (Audio Duration)
asset_id
any (Asset Id)
header Parameters
Authorization (string) or Authorization (null) (Authorization)
Username (string) or Username (null) (Username)
X-Client-Type (string) or X-Client-Type (null) (X-Client-Type)
Request Body schema: multipart/form-data
audio_file
string <binary> (Audio input file)

Input audio file.

Responses
200

New Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/v2/audio
Request samples
Response samples
application/json
{
  • "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  • "customer_id": "customer123",
  • "primary_user_id": "user-abc-123",
  • "args": {
    },
  • "thumbnails": [ ],
  • "status": "success",
  • "mime_type": "audio/mp3",
  • "app_name": "text-to-speech",
  • "quantity": 1,
  • "credits": 8,
  • "created_at": "2025-05-06T16:35:59.840508+00:00",
  • "updated_at": "2025-05-06T16:36:12.102933+00:00",
  • "history": [ ]
}

Estimate Credits

Estimates credits without creating an asset.

Field Required Notes
app_name Yes text-to-speech, text-to-audio, text-to-music, upload-audio
option Yes e.g. AutoA, ElevenLabsTTS, ChirpTTS, ElevenLabsSE, MMAudio, Lyria
quantity No Default 1
prompt No Length-based pricing when applicable
duration No Seconds (sound effects)
voice No e.g. AutoV (text-to-speech)
Request
Request Body schema: application/json
required
app_name
string (App Name)
Default: "text-to-speech"

Internal route id on the asset record (same as the target POST /audio/* route). Available values: text-to-speech (default), text-to-audio, text-to-music, upload-audio (0 credits).

option
required
string (Generation Option)

Model/tool option on the target audio endpoint (e.g. ElevenLabsTTS, ChirpTTS, ElevenLabsSE, MMAudio, Lyria).

quantity
integer (Quantity) [ 1 .. 4 ]
Default: 1
prompt
string (Prompt)
Default: ""
Duration (seconds) (integer) or Duration (seconds) (null) (Duration (seconds))

Same as duration on audio routes such as text-2-sound-effect.

Audio Duration (number) or Audio Duration (null) (Audio Duration)

Explicit seconds override (alias of duration for audio apps).

Responses
200

Credit estimate and breakdown

422

Validation Error

post/v2/audio/estimate-credits
Request samples
Response samples
application/json
{
  • "total_credits": 0,
  • "breakdown": { }
}

Text 2 Speech

Text-to-speech from prompt.

Field Default Notes
voice AutoV See schema for voice names
option ElevenLabsTTS or ChirpTTS
language "" e.g. en-US; auto-detected if empty
import requests

r = requests.post(
    "https://api.vimmerse.net/v2/audio/text-2-speech",
    headers={"X-Api-Key": "YOUR_API_KEY", "Content-Type": "application/json"},
    json={"prompt": "Hello from Vimmerse.", "voice": "AutoV", "option": "ElevenLabsTTS"},
    timeout=300,
)
print(r.json()["results"])
SecurityAPIKeyHeader
Request
header Parameters
Authorization (string) or Authorization (null) (Authorization)
Username (string) or Username (null) (Username)
X-Client-Type (string) or X-Client-Type (null) (X-Client-Type)
Request Body schema: application/json
required
prompt
string (Prompt)
Default: ""

Speech text.

voice
string (Voice ID or name)
Default: "AutoV"
option
string (Option)
Default: "ElevenLabsTTS"

ElevenLabsTTS or ChirpTTS.

language
string (Language)
Default: ""

Language code, e.g. en-US.

Responses
200

Generated speech

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/v2/audio/text-2-speech
Request samples
Response samples
application/json
{
  • "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  • "customer_id": "customer123",
  • "primary_user_id": "user-abc-123",
  • "args": {
    },
  • "thumbnails": [ ],
  • "status": "success",
  • "mime_type": "audio/mp3",
  • "app_name": "text-to-speech",
  • "quantity": 1,
  • "credits": 8,
  • "created_at": "2025-05-06T16:35:59.840508+00:00",
  • "updated_at": "2025-05-06T16:36:12.102933+00:00",
  • "history": [ ]
}

Text 2 Sound Effect

Sound effect from prompt. duration: length in seconds (default 5).

SecurityAPIKeyHeader
Request
header Parameters
Authorization (string) or Authorization (null) (Authorization)
Username (string) or Username (null) (Username)
X-Client-Type (string) or X-Client-Type (null) (X-Client-Type)
Request Body schema: application/json
required
prompt
string (Prompt)
Default: ""

Sound effects description.

duration
integer (Duration) >= 1
Default: 5

Duration in seconds.

option
string (Option)
Default: "MMAudio"

ElevenLabsSE or MMAudio.

Responses
200

Generated sound effect

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/v2/audio/text-2-sound-effect
Request samples
Response samples
application/json
{
  • "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  • "customer_id": "customer123",
  • "primary_user_id": "user-abc-123",
  • "args": {
    },
  • "thumbnails": [ ],
  • "status": "success",
  • "mime_type": "audio/mp3",
  • "app_name": "text-to-speech",
  • "quantity": 1,
  • "credits": 8,
  • "created_at": "2025-05-06T16:35:59.840508+00:00",
  • "updated_at": "2025-05-06T16:36:12.102933+00:00",
  • "history": [ ]
}

Text 2 Music

Music from prompt (Lyria). Returns WAV in results.

SecurityAPIKeyHeader
Request
header Parameters
Authorization (string) or Authorization (null) (Authorization)
Username (string) or Username (null) (Username)
X-Client-Type (string) or X-Client-Type (null) (X-Client-Type)
Request Body schema: application/json
required
prompt
string (Prompt)
Default: ""

Music description.

option
string (Option)
Default: "Lyria"

Lyria.

Responses
200

Generated music

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/v2/audio/text-2-music
Request samples
Response samples
application/json
{
  • "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  • "customer_id": "customer123",
  • "primary_user_id": "user-abc-123",
  • "args": {
    },
  • "thumbnails": [ ],
  • "status": "success",
  • "mime_type": "audio/mp3",
  • "app_name": "text-to-speech",
  • "quantity": 1,
  • "credits": 8,
  • "created_at": "2025-05-06T16:35:59.840508+00:00",
  • "updated_at": "2025-05-06T16:36:12.102933+00:00",
  • "history": [ ]
}