Audio

APIs for generating audios.

Upload Audio

Uploads audio files to the Vimmerse platform for use in API requests and video production.

Purpose

  • Store audio files securely in the cloud
  • Reference audio by URL in other API calls
  • Upload music, sound effects, or voice recordings
  • Manage audio assets efficiently

Usage Flow

  1. Upload audio file using this endpoint
  2. Receive a URL for the uploaded audio
  3. Use the URL with audio_url parameter in other endpoints

Best Practices

  • Keep file sizes under 50MB for optimal performance
  • Use MP3 format with 128kbps or higher bitrate
  • Ensure audio is clean and well-recorded
SecurityAPIKeyHeader
Request
Request Body schema: multipart/form-data
audio_file
string <binary> (Audio input file)

Input audio file.

audio_url
string (Audio input URL)
Default: ""

Input audio URL.

Responses
200

New Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Speech

Converts text to natural-sounding speech using AI voice synthesis.

How It Works

The API generates realistic human-like speech from text input using advanced neural voice technology. This is perfect for creating voiceovers, narrations, and spoken content without needing to record audio.

Available Voices

Voice Name Gender Character
Rachel Female Professional, warm
Aria Female Friendly, energetic
Sarah Female Calm, authoritative
Laura Female Cheerful, upbeat
Charlotte Female Elegant, refined
Alice Female Young, playful
Matilda Female Sophisticated, mature
Jessica Female Clear, professional
Lily Female Sweet, gentle
Roger Male Strong, confident
George Male Deep, authoritative
Callum Male Friendly, approachable
River Male Smooth, engaging
Liam Male Natural, conversational
Will Male Professional, clear
Eric Male Bold, dynamic
Chris Male Warm, trustworthy
Brian Male Mature, distinguished
Daniel Male Confident, articulate
Bill Male Energetic, engaging
Charlie Male Youthful, energetic

Parameters

  • prompt: Text to convert to speech (required)
    • Maximum length: ~5,000 characters
    • Plain text or simple formatting
  • voice: Voice selection (required)
    • Default: "Rachel"
    • Choose from available voices listed above

Use Cases

  • Video voiceovers
  • Podcast intros
  • Audiobook narration
  • Accessibility content
  • Interactive applications
  • Automated announcements

Example

import requests

url = "https://api.vimmerse.net/audio/text-2-speech"

headers = {
    'X-Api-Key': 'YOUR_API_KEY'
}

payload = {
    "prompt": "Welcome to our AI-powered creative platform. We're excited to show you what's possible.",
    "voice": "Aria"  # Choose from available voices
}

response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

Response

Returns an audio file URL that can be used in video production or downloaded directly.

SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt of the speech.

voice
string (Voice)
Default: "Rachel"

The voice used for the narration. Available voices are "Rachel", "Aria", "Roger", "Sarah", "Laura", "Charlie", "George", "Callum", "River", "Liam", "Charlotte", "Alice", "Matilda", "Will", "Jessica", "Eric", "Chris", "Brian", "Daniel", "Lily" and "Bill". The default narrator is Rachel.

option
string (Option)
Default: ""

Option for service model.

language
string (Language)
Default: ""

Language of the speech.

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-speech
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Sound Effect

Generates sound effects from text descriptions using AI audio synthesis.

How It Works

Creates realistic sound effects based on natural language descriptions. No need for sound libraries or manual creation.

Parameters

  • prompt: Description of the desired sound effect (required)
    • Examples:
      • "phone ringtone"
      • "thunder and rain"
      • "footsteps on wooden floor"
      • "car door closing"
      • "birds chirping in the morning"
  • duration: Length of audio in seconds (optional)
    • Range: 1-30 seconds
    • Default: 5 seconds

Sound Effect Categories

  • Nature: Rain, wind, ocean waves, birds, animals
  • Technology: Phone rings, keyboard typing, camera shutter
  • Human: Footsteps, clapping, breathing, laughter
  • Vehicles: Car engine, door closing, horn, tire screech
  • Ambiance: Crowd noise, restaurant, office sounds
  • Abstract: Sci-fi sounds, magical effects, electronic

Best Practices

  • Be specific in descriptions for better results
  • Combine elements for complex sounds
  • Test with different durations
  • Use generated sounds in video projects

Example

import requests

url = "https://api.vimmerse.net/audio/text-2-sound-effect"

headers = {
    'X-Api-Key': 'YOUR_API_KEY'
}

payload = {
    "prompt": "ambient cafe sounds with coffee brewing and distant chatter",
    "duration": 10  # 10-second sound effect
}

response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

Use Cases

  • Video production
  • Podcast background audio
  • Game development
  • Virtual reality environments
  • Sound design projects
SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt that describes the sound effect.

duration
integer (Duration)
Default: 5

Duration of the audio

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-sound-effect
Request samples
Response samples
application/json
{
  • "data": {
    }
}

Text 2 Music

Generates 30-second music tracks from text descriptions using Google's Lyria music AI.

How It Works

Creates original, royalty-free music compositions based on natural language descriptions. Each track is 30 seconds long and unique.

Parameters

  • prompt: Description of the desired music (required)
    • Include: Genre, mood, instruments, tempo
    • Examples:
      • "Upbeat electronic dance music with synthesizers"
      • "Gentle acoustic guitar melody for relaxation"
      • "Epic orchestral score with drums and strings"
      • "Jazzy piano with walking bass line"

Music Styles Supported

  • Classical: Orchestral, piano solos, chamber music
  • Electronic: EDM, techno, ambient, synth-pop
  • Rock: Guitar-driven, drums, energetic
  • Jazz: Improvisational, smooth, complex harmonies
  • Ambient: Atmospheric, minimal, background
  • Cinematic: Film scores, dramatic, emotional
  • Lo-fi: Chill beats, relaxed, nostalgic

Best Practices

  • Be descriptive: Include genre, mood, and instruments
  • Combine styles for unique results
  • Specify tempo preferences (fast, slow, medium)
  • Mention emotional tone (happy, sad, energetic, calm)

Example

import requests

url = "https://api.vimmerse.net/audio/text-2-music"

headers = {
    'X-Api-Key': 'YOUR_API_KEY'
}

payload = {
    "prompt": "A lush, ambient soundscape featuring flowing water sounds, distant bird chirps, and a gentle melancholic piano melody that slowly unfolds. Create a peaceful, meditative atmosphere suitable for relaxation or background music."
}

response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

Use Cases

  • Video background music
  • Podcast intros/outros
  • Meditation apps
  • YouTube content
  • Presentations
  • Marketing videos
  • Therapeutic applications

Response

Returns a WAV audio file (30 seconds) that can be used commercially without royalty concerns.

SecurityAPIKeyHeader
Request
Request Body schema: application/x-www-form-urlencoded
prompt
string (Prompt)
Default: ""

Prompt that describes the music.

Responses
200

Audio URL

400

Bad Request

402

Insufficient Credit

422

Validation Error

post/audio/text-2-music
Request samples
Response samples
application/json
{
  • "data": {
    }
}