Most modular solution for technical teams. Send audio from your voice agent, receive avatar video streams with 250ms response times from audio input to HD avatar video output.

Pipeline Overview

1

Your Voice Agent Pipeline

You manage media transport, turn detection, STT, LLM, and TTS components
2

Beyond Presence Speech-to-Video API

Receives audio input from your pipeline
3

Avatar Video Output

Beyond Presence manages avatar generation and video streaming

Supported Frameworks

We support integration with popular voice agent frameworks including LiveKit and Pipecat, allowing you to add avatar video to your existing voice pipelines.

When to Use This

Choose speech-to-video when you need:
  • Full control: Complete management of turn detection, STT, LLM, and TTS components
  • Existing pipelines: Integration with current voice agent infrastructure
For zero-infrastructure deployment, use managed agents instead.

Next Steps