Integration Options
Learn how to integrate the Beyond Presence real-time API into your tech stack.
There are several different options for integrating Beyond Presence avatars into your tech stack. Which option is best for you, depends on which components of the pipeline you want to manage yourself.
Integration via end-to-end API with custom LLM is still under development. In the meanwhile, we recommend to use the audio-to-video API if your use case requires a custom LLM. If your use case requires one of the other integration options that are still in development, please reach out to us at support@beyondpresence.ai!
Pipeline Components / Layers
A typical conversational video agent pipeline consists of the following layers:
Transport In
Transport In
The input transport layer defines how the audio and video of the user reach the agent. Usually this is handled by a video conferencing solution or other WebRTC service. Beyond Presence agents currently use LiveKit to manage all data transport.
ASR (Automatic Speech Recognition)
ASR (Automatic Speech Recognition)
The ASR component is the “ear” of the agent that is responsible for listening to user audio, transcribing it, and detecting interruptions.
LLM
LLM
The LLM layer is the “brain” of the agent that is responsible for generating agent responses. This might also include RAG, function calling, or other methods for handling knowledge and external tools.
TTS (Text-to-Speech)
TTS (Text-to-Speech)
The TTS component is the “voice” of the agent that is responsible for turning agent responses into audio. Beyond Presence provides the TTS audio using latency-optimized voice models powered by ElevenLabs and Cartesia.
Real-Time Avatar
Real-Time Avatar
The real-time avatar is the “face” of the agent. Beyond Presence provides lightning-fast high-quality avatars powered by our proprietary 3D AI models.
Transport Out
Transport Out
The output transport layer defines how the audio and video of the agent get back to the user. Usually this is handled by a video conferencing solution or other WebRTC service. All Beyond Presence APIs currently use LiveKit to manage the output transport.
Integration Options
Below is a closer description of the different options how you can integrate Beyond Presence real-time avatars into your existing tech stack:
End-to-End
End-to-End
- 🎯 Ideal for non-technical teams
- ✅ Easy to use, fully managed, optimized end-to-end latency
- ⚠️ No control over technical components
The end-to-end integration option is the easiest integration option and also has the lowest conversation latency, but it is not ideal for developers who need control over some of the components in the agent pipeline since all components of the pipeline are managed by Beyond Presence. To use this option, you can simply create an agent through the Beyond Presence Dashboard and embed it as an iFrame in your frontend for users to interact with it.
End-to-End with Custom LLM
End-to-End with Custom LLM
- 🎯 Ideal for low-tech teams who want to bring their own LLM
- ✅ Custom LLM support, fully managed agent & conversation handling
- ⚠️ Requires a deployed LLM, no control over ASR/TTS, tricky to get latency right
If you need support for custom LLMs but don’t want to build your own agent pipeline in-house, the end-to-end API with custom LLM is the option for you. However, to use this option, you will either need to deploy your custom LLM as a streaming text-to-text Websocket API that the Beyond Presence agent can access, or you need to connect to the Beyond Presence agent via Websocket and modify your LLM logic to continuously fetch conversation history changes from the Beyond Presence agent and stream responses back. Overall, this solution is most straight-forward if you already have a deployed LLM server, but somewhat difficult to set up otherwise, and the overall call latency will also heavily depend on the efficiency of your LLM server implementation.
Audio-to-Video
Audio-to-Video
- 🎯 Ideal for Python developers and technical teams with existing audio agents
- ✅ Modular, full control over all pipeline components, most extensible
- ⚠️ Requires building your own audio agent
The audio-to-video API is our recommended option for deep tech teams since it is the most modular solution that will give you maximum control over your audio agent stack. It is a great integration option if you already have an existing audio agent pipeline, if you want to have full control over the interaction with the end user, or if you plan to integrate custom components for visually perceiving the user or similar. To use it, you stream the audio output of your TTS / audio agent to the Beyond Presence API, and have the resulting avatar video directly displayed to your user.
Learn More
Audio-to-Video API
Use the Beyond Presence Audio-to-Video API to add a face to your audio agents.
End-to-end API
Integrate Beyond Presence end-to-end agents into your own apps.
LiveKit Agents Integration
How to use Beyond Presence with custom LiveKit agents.
n8n Integration
Integrate Beyond Presence webhooks with n8n workflows.