AI Telephony Agent Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.
Installation
Prerequisites
Python 3.11+
VideoSDK account
Twilio account (SIP trunking provider)
Google API key (for Gemini AI)
Setup
Clone the repository
git clone https://github.com/yourusername/ai-agent-telephony.git cd ai-agent-telephony
Install dependencies
pip install -r requirements.txt
Configure environment variables Create a .env file:
# VideoSDK Configuration VIDEOSDK_AUTH_TOKEN = your_videosdk_token VIDEOSDK_SIP_USERNAME = your_sip_username VIDEOSDK_SIP_PASSWORD = your_sip_password # AI Configuration GOOGLE_API_KEY = your_google_api_key # Twilio SIP Trunking Configuration TWILIO_SID = your_twilio_sid TWILIO_AUTH_TOKEN = your_twilio_auth_token TWILIO_NUMBER = your_twilio_number
Run the server
python server.py
The server will start on http://localhost:8000
API Endpoints
Handle Inbound Calls (SIP User Agent Server)
POST /inbound-call
Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use ngrok :
POST < server-url > /inbound-call
CallSid : Unique call identifier
: Unique call identifier From : Caller's phone number (CLI - Calling Line Identification)
: Caller's phone number (CLI - Calling Line Identification) To : Recipient's phone number (DID - Direct Inward Dialing)
Initiate Outbound Calls (SIP User Agent Client)
POST /outbound-call Content-Type: application/json { " to_number " : " +1234567890 " , " initial_greeting " : " Hello from AI Agent! " }
Configure SIP Provider
POST /configure-provider ? provider_name=twilio
Switch SIP providers at runtime (currently supports: twilio ).
Adding New SIP Providers
The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider:
1. Create Provider Implementation
Create providers/your_provider.py :
from typing import Dict , Any from . base import SIPProvider from config import Config class YourProvider ( SIPProvider ): def __init__ ( self ): self . client = self . create_client () def create_client ( self ) -> Any : return YourProviderClient ( Config . YOUR_API_KEY ) def generate_twiml ( self , sip_endpoint : str , ** kwargs ) -> str : return f" { sip_endpoint } " def initiate_outbound_call ( self , to_number : str , twiml : str ) -> Dict [ str , Any ]: call = self . client . calls . create ( to = to_number , from_ = Config . YOUR_NUMBER , twiml = twiml ) return { "call_sid" : call . id , "status" : call . status , "provider" : "your_provider" } def get_provider_name ( self ) -> str : return "your_provider"
2. Update Provider Factory
Add to providers/__init__.py :
from . your_provider import YourProvider def get_provider ( provider_name : str = "twilio" ) -> SIPProvider : providers = { "twilio" : TwilioProvider , "your_provider" : YourProvider , } # ... rest of function
3. Add Configuration
Update config.py :
class Config : YOUR_API_KEY = os . getenv ( "YOUR_API_KEY" ) YOUR_NUMBER = os . getenv ( "YOUR_NUMBER" ) @ classmethod def validate ( cls ) -> None : required_vars = { # ... existing vars "YOUR_API_KEY" : cls . YOUR_API_KEY , "YOUR_NUMBER" : cls . YOUR_NUMBER , } # ... rest of validation
Adding New AI Agents
Similarly, you can add new AI agents for intelligent call handling:
1. Create AI Agent Implementation
Create ai/your_ai_agent.py :
from typing import Dict , Any from videosdk . agents import AgentSession , RealTimePipeline from . base_agent import AIAgent from voice_agent import VoiceAgent from config import Config class YourAIAgent ( AIAgent ): def create_pipeline ( self ) -> RealTimePipeline : model = YourAIModel ( api_key = Config . YOUR_AI_API_KEY , model = "your-model-name" ) return RealTimePipeline ( model = model ) def create_session ( self , room_id : str , context : Dict [ str , Any ]) -> AgentSession : pipeline = self . create_pipeline () agent_context = { "name" : "Your AI Agent" , "meetingId" : room_id , "videosdk_auth" : Config . VIDEOSDK_AUTH_TOKEN , ** context } session = AgentSession ( agent = VoiceAgent ( context = agent_context ), pipeline = pipeline , context = agent_context ) return session def get_agent_name ( self ) -> str : return "your_ai_agent"
2. Update AI Agent Factory
Add to ai/__init__.py :
from . your_ai_agent import YourAIAgent def get_ai_agent ( agent_name : str = "gemini" ) -> AIAgent : agents = { "gemini" : GeminiAgent , "your_ai_agent" : YourAIAgent , } # ... rest of function
Testing
Health Check
curl " http://localhost:8000/health "
Outbound Call Test (SIP UAC)
curl -X POST " http://localhost:8000/outbound-call " \ -H " Content-Type: application/json " \ -d ' {"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"} '
Switch SIP Provider
curl -X POST " http://localhost:8000/configure-provider?provider_name=twilio "
🔧 Configuration
Environment Variables
Variable Description Required VIDEOSDK_AUTH_TOKEN VideoSDK authentication token ✅ VIDEOSDK_SIP_USERNAME VideoSDK SIP username ✅ VIDEOSDK_SIP_PASSWORD VideoSDK SIP password ✅ GOOGLE_API_KEY Google API key for Gemini ✅ TWILIO_SID Twilio account SID ✅ TWILIO_AUTH_TOKEN Twilio auth token ✅ TWILIO_NUMBER Twilio phone number ✅
Provider-Specific Variables
For additional SIP providers, add their specific environment variables to config.py .
Features
SIP/VoIP Integration : Pluggable SIP providers (Twilio, and more) with session initiation protocol support
: Pluggable SIP providers (Twilio, and more) with session initiation protocol support AI-Powered Voice Agents : Pluggable AI agents (Gemini, and more) for intelligent call handling
: Pluggable AI agents (Gemini, and more) for intelligent call handling Real-time Voice Communication : AI agents with real-time transport protocol (RTP) capabilities
: AI agents with real-time transport protocol (RTP) capabilities Modular Architecture : Clean separation of concerns for scalable telephony solutions
: Clean separation of concerns for scalable telephony solutions Runtime Configuration : Switch SIP providers and AI agents without restart
: Switch SIP providers and AI agents without restart VideoSDK Integration : Seamless room creation and session management
: Seamless room creation and session management Call Control : Advanced call routing, forwarding, and transfer capabilities
: Advanced call routing, forwarding, and transfer capabilities Codec Support: Multiple audio codecs for optimal voice quality
Use Cases
Customer Service (SIP-based)
AI agents handle customer inquiries via VoIP
24/7 availability with SIP trunking
Consistent service quality across PSTN and IP networks
Appointment Scheduling
Automated appointment booking via SIP calls
Reminder calls using SIP user agent client
Rescheduling assistance with DTMF support
Surveys and Feedback
Automated survey calls over SIP
Customer feedback collection via VoIP
Data collection with real-time transport protocol
Emergency Notifications
Automated emergency alerts via SIP trunking
Mass notification systems using PSTN integration
Status updates through IP multimedia subsystem (IMS)
Architecture Benefits
Separation of Concerns: Each component has a single responsibility Extensibility: Easy to add new SIP providers and AI agents Testability: Components can be tested in isolation Maintainability: Clear structure makes code easier to understand Reusability: Components can be reused across different projects Configuration Management: Centralized configuration with validation SIP Compliance: Full session initiation protocol support VoIP Integration: Seamless integration with voice over internet protocol
Roadmap
Add support for multiple AI agents per session
Add support for multiple AI agents per session Implement SIP-specific features (SBC, registrar, proxy server)
Implement SIP-specific features (SBC, registrar, proxy server) Add monitoring and metrics for SIP sessions
Add monitoring and metrics for SIP sessions Create provider-specific webhook handlers
Create provider-specific webhook handlers Add support for different voice codecs per AI agent
Add support for different voice codecs per AI agent Implement call recording and transcription
Implement call recording and transcription Add sentiment analysis for call quality
Add sentiment analysis for call quality Create web dashboard for call management
Create web dashboard for call management Support for H.323 protocol integration
Support for H.323 protocol integration Advanced call control features (forwarding, transfer, queue)
🤝 Contributing
Fork the repository Create a feature branch ( git checkout -b feature/amazing-feature ) Commit your changes ( git commit -m 'Add amazing feature' ) Push to the branch ( git push origin feature/amazing-feature ) Open a Pull Request
Guidelines
Follow the existing code patterns
Add proper error handling
Include logging
Update documentation
Add tests if possible
License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ for the developer community