Voice API

 

Szenarios (Non-Streaming)

Request data

Response data

Szenarios (Non-Streaming)

Request data

Response data

Full voice request

Audio + text_data → Audio + text_data

Mandatory:

  • Audio-recording of user-voice

  • Response-voice id/name

  • Personality/Prompt

Optional:

  • LLM-Model to use (GPT3.5, GPT4, GPT4-turbo, Mistral-7B,…)

  • Audio-type: .wav/.mp3

  • Language

  • LLM-Temperature

Mandatory:

  • Response audio-data

  • Response transscription (text)

Optional:

  • Detected language

Speech-to-text (STT)

Audio (+ text_data) → text_data

Mandatory:

  • Audio-recording of user-voice

Optional:

  • Audio-type source (.wav/.mp3)

  • Language

Mandatory:

  • Response transscription (text)

Optional:

  • Detected language

LLM-Response

text_data → text_data

Mandatory:

  • Personality/prompt

  • Request text

Optional:

  • LLM-Model to use (GPT3.5, GPT4, GPT4-turbo, Mistral-7B,…)

  • Language

  • LLM-Temperature

Mandatory:

  • Response text

Optional:

  • Detected language

Text-to-speech (TTS)

text_data → Audio

Mandatory:

  • Text to transform

  • Response-voice id/name

Mandatory:

  • Response audio-data

Data-Types:

  • Audio-type: .wav/.mp3

  • Language: Enumeration with a predefined list of available languages

  • Response-voice id/name: Enumeration with a predefined list of available voices

  • LLM-Model: Enumeration with a predefined list of available models

  • Temperature: float between and including 0 and 1