Audio

APIs for generating audios.

Upload Audio

Uploads audio files to the Vimmerse platform for use in API requests and video production.

Purpose

  • Store audio files securely in the cloud
  • Reference audio by URL in other API calls
  • Upload music, sound effects, or voice recordings
  • Manage audio assets efficiently

Usage Flow

  1. Upload audio file using this endpoint
  2. Receive a URL for the uploaded audio
  3. Use the URL with audio_url parameter in other endpoints

Best Practices

  • Keep file sizes under 50MB for optimal performance
  • Use MP3 format with 128kbps or higher bitrate
  • Ensure audio is clean and well-recorded
SecurityAPIKeyHeader
Request
Request Body schema: multipart/form-data
audio_file
string <binary> (Audio input file)

Input audio file.

audio_url
string (Audio input URL)
Default: ""

Input audio URL.

Responses
200

New Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Speech

Converts text to natural-sounding speech using AI voice synthesis.

How It Works

The API generates realistic human-like speech from text input using advanced neural voice technology. This is perfect for creating voiceovers, narrations, and spoken content without needing to record audio.

Parameters

Parameter Type Required Default Description
prompt string Yes - Text to convert to speech. Maximum length: ~5,000 characters. Plain text or simple formatting.
voice string No "Rachel" Voice selection. See available voices below.
option string No "" Service model option: "ElevenLabsTTS" or "ChirpTTS". If empty, defaults to ElevenLabs.
language string No "" Language code (e.g., "en-US"). Used with ChirpTTS option. Auto-detected if not provided.

Available Voices

Voice Name Gender Character
Rachel Female Professional, warm
Aria Female Friendly, energetic
Sarah Female Calm, authoritative
Laura Female Cheerful, upbeat
Charlotte Female Elegant, refined
Alice Female Young, playful
Matilda Female Sophisticated, mature
Jessica Female Clear, professional
Lily Female Sweet, gentle
Roger Male Strong, confident
George Male Deep, authoritative
Callum Male Friendly, approachable
River Male Smooth, engaging
Liam Male Natural, conversational
Will Male Professional, clear
Eric Male Bold, dynamic
Chris Male Warm, trustworthy
Brian Male Mature, distinguished
Daniel Male Confident, articulate
Bill Male Energetic, engaging
Charlie Male Youthful, energetic

Use Cases

  • Video voiceovers
  • Podcast intros
  • Audiobook narration
  • Accessibility content
  • Interactive applications
  • Automated announcements

Example Request

import requests

BASE_URL = "https://api.vimmerse.net"
API_KEY = "YOUR_API_KEY"

url = f"{BASE_URL}/audio/text-2-speech"
headers = {"X-Api-Key": API_KEY}

payload = {
    "prompt": "Welcome to our AI-powered creative platform. We're excited to show you what's possible.",
    "voice": "Aria",  # Choose from available voices
    "option": "ElevenLabsTTS"  # Optional: specify service model
}

try:
    response = requests.post(url, headers=headers, data=payload, timeout=60)
    response.raise_for_status()

    result = response.json()
    audio_data = result["data"]

    # Response contains array of objects with URL
    if audio_data and len(audio_data) > 0:
        audio_url = audio_data[0].get("URL")
        print(f"Audio generated successfully!")
        print(f"Audio URL: {audio_url}")

except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
    if e.response is not None:
        print(f"Response: {e.response.text}")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Response

Returns an array of objects containing:

  • URL - Audio file URL that can be used in video production or downloaded directly
  • thumbnail_url - Thumbnail image URL (if available)

Error Handling

  • 200 - Success. Audio generation completed.
  • 400 - Bad Request. Invalid parameters or missing required fields.
  • 402 - Payment Required. Insufficient credits in your account.
  • 422 - Unprocessable Entity. Failed to generate audio (may fallback to alternative service).
SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt of the speech.

voice
string (Voice)
Default: "Rachel"

The voice used for the narration. Available voices are "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily" and "Bill". The default narrator is Rachel.

option
string (Option)
Default: ""

Option for service model.

language
string (Language)
Default: ""

Language of the speech.

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-speech
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Sound Effect

Generates sound effects from text descriptions using AI audio synthesis.

How It Works

Creates realistic sound effects based on natural language descriptions. No need for sound libraries or manual creation.

Parameters

Parameter Type Required Default Description
prompt string Yes - Description of the desired sound effect. Be specific for better results.
duration integer No 5 Length of audio in seconds. Range: 1-30 seconds.

Sound Effect Categories

  • Nature: Rain, wind, ocean waves, birds, animals
  • Technology: Phone rings, keyboard typing, camera shutter
  • Human: Footsteps, clapping, breathing, laughter
  • Vehicles: Car engine, door closing, horn, tire screech
  • Ambiance: Crowd noise, restaurant, office sounds
  • Abstract: Sci-fi sounds, magical effects, electronic

Prompt Examples

  • "phone ringtone"
  • "thunder and rain"
  • "footsteps on wooden floor"
  • "car door closing"
  • "birds chirping in the morning"
  • "ambient cafe sounds with coffee brewing and distant chatter"

Best Practices

  • Be specific in descriptions for better results
  • Combine elements for complex sounds
  • Test with different durations
  • Use generated sounds in video projects

Example Request

import requests

BASE_URL = "https://api.vimmerse.net"
API_KEY = "YOUR_API_KEY"

url = f"{BASE_URL}/audio/text-2-sound-effect"
headers = {"X-Api-Key": API_KEY}

payload = {
    "prompt": "ambient cafe sounds with coffee brewing and distant chatter",
    "duration": 10  # 10-second sound effect
}

try:
    response = requests.post(url, headers=headers, data=payload, timeout=60)
    response.raise_for_status()

    result = response.json()
    audio_data = result["data"]

    if audio_data and len(audio_data) > 0:
        audio_url = audio_data[0].get("URL")
        print(f"Sound effect generated successfully!")
        print(f"Audio URL: {audio_url}")

except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Use Cases

  • Video production
  • Podcast background audio
  • Game development
  • Virtual reality environments
  • Sound design projects

Error Handling

  • 200 - Success. Sound effect generation completed.
  • 400 - Bad Request. Invalid parameters or missing required fields.
  • 402 - Payment Required. Insufficient credits in your account.
  • 422 - Unprocessable Entity. Failed to generate sound effect.
SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt that describes the sound effect.

duration
integer (Duration)
Default: 5

Duration of the audio

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-sound-effect
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Music

Generates 30-second music tracks from text descriptions using Google's Lyria music AI.

How It Works

Creates original, royalty-free music compositions based on natural language descriptions. Each track is 30 seconds long and unique.

Parameters

Parameter Type Required Default Description
prompt string Yes - Description of the desired music. Include genre, mood, instruments, and tempo for best results.

Music Styles Supported

  • Classical: Orchestral, piano solos, chamber music
  • Electronic: EDM, techno, ambient, synth-pop
  • Rock: Guitar-driven, drums, energetic
  • Jazz: Improvisational, smooth, complex harmonies
  • Ambient: Atmospheric, minimal, background
  • Cinematic: Film scores, dramatic, emotional
  • Lo-fi: Chill beats, relaxed, nostalgic

Prompt Examples

  • "Upbeat electronic dance music with synthesizers"
  • "Gentle acoustic guitar melody for relaxation"
  • "Epic orchestral score with drums and strings"
  • "Jazzy piano with walking bass line"
  • "A lush, ambient soundscape featuring flowing water sounds, distant bird chirps, and a gentle melancholic piano melody"

Best Practices

  • Be descriptive: Include genre, mood, and instruments
  • Combine styles for unique results
  • Specify tempo preferences (fast, slow, medium)
  • Mention emotional tone (happy, sad, energetic, calm)

Example Request

import requests

BASE_URL = "https://api.vimmerse.net"
API_KEY = "YOUR_API_KEY"

url = f"{BASE_URL}/audio/text-2-music"
headers = {"X-Api-Key": API_KEY}

payload = {
    "prompt": "A lush, ambient soundscape featuring flowing water sounds, distant bird chirps, and a gentle melancholic piano melody that slowly unfolds. Create a peaceful, meditative atmosphere suitable for relaxation or background music."
}

try:
    response = requests.post(url, headers=headers, data=payload, timeout=120)
    response.raise_for_status()

    result = response.json()
    audio_data = result["data"]

    if audio_data and len(audio_data) > 0:
        audio_url = audio_data[0].get("URL")
        print(f"Music generated successfully!")
        print(f"Audio URL: {audio_url}")

except requests.exceptions.HTTPError as e:
    print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Use Cases

  • Video background music
  • Podcast intros/outros
  • Meditation apps
  • YouTube content
  • Presentations
  • Marketing videos
  • Therapeutic applications

Response

Returns a WAV audio file (30 seconds) that can be used commercially without royalty concerns.

Error Handling

  • 200 - Success. Music generation completed.
  • 400 - Bad Request. Invalid parameters or missing required fields.
  • 402 - Payment Required. Insufficient credits in your account.
  • 422 - Unprocessable Entity. Failed to generate music.
SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt that describes the music.

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-music
Request samples
Response samples
application/json
{
  • "data": {
    }
}