Agent LLM Selection Guide

LLM Selection Overview
TriLuna offers multiple Large Language Model (LLM) options for your agents. Choosing the right LLM impacts your agent's intelligence, response quality, and conversation capabilities.

Available LLM Options

Your TriLuna dashboard includes an LLM dropdown menu where you can select from several powerful language models, each optimized for different use cases:

Current LLM Options

GPT-4 Series (Recommended for Most Use Cases)

Best for: Complex reasoning, nuanced conversations, professional interactions
Strengths: Excellent comprehension, context retention, creative problem-solving
Use Cases: Customer service, sales, technical support, appointment setting
Response Quality: Highest quality, most human-like responses

GPT-3.5 Turbo (Fast and Efficient)

Best for: Quick interactions, high-volume calling, cost-conscious deployments
Strengths: Fast response times, efficient processing, good general knowledge
Use Cases: Lead qualification, appointment reminders, basic information gathering
Response Quality: Good quality with faster response times

Claude Models (Alternative Option)

Best for: Analytical tasks, detailed explanations, thoughtful responses
Strengths: Careful reasoning, thorough responses, safety-focused
Use Cases: Consultative selling, complex information gathering, detailed support
Response Quality: High quality with emphasis on accuracy and helpfulness

Model Availability: Available LLM options may vary based on your subscription plan and current system capacity. Premium models may have usage limits or additional costs.

How to Change Your Agent’s LLM

Using the Interactive LLM Selector

Log into your TriLuna dashboard
Navigate to My Agents
Click on the agent you want to modify
Find the AI Model section with an interactive dropdown
Click the dropdown to see all available models organized by provider:
- Google: Gemini models (fast, efficient)
- OpenAI: GPT models (versatile, powerful)
- Anthropic: Claude models (thoughtful, safe)
Select your preferred model - the change applies immediately
Your agent will use the new LLM for all future conversations

Understanding the Numbers

Each LLM model shows two important specifications:

Max Tokens (e.g., “8,192 tokens”)

What it means: Maximum length of response the model can generate in a single reply
Practical impact: Higher numbers = longer, more detailed responses possible
Typical range: 4,096 tokens = ~3,000 words, 8,192 tokens = ~6,000 words
Choose higher for: Detailed explanations, complex scenarios, thorough customer service
Choose lower for: Quick interactions, brief responses, fast-paced conversations

Context Tokens (e.g., “128,000 tokens”)

What it means: How much conversation history the model can remember and reference
Practical impact: Higher numbers = better memory of earlier conversation parts
Example: 128,000 tokens = remembers last ~100,000 words of conversation
Choose higher for: Long consultations, complex multi-topic discussions, detailed support calls
Choose lower for: Simple, short interactions where conversation history isn’t critical

Quick Guide: For most business use cases, choose models with at least 8,192 max tokens and 128,000 context tokens. This ensures your agent can give detailed responses while remembering the full conversation.

Immediate Effect: LLM changes take effect within 5-10 minutes and apply to all new conversations. Ongoing calls will complete with the previous LLM.

Choosing the Right LLM

Consider Your Use Case

High-Touch Customer Service

Recommended: GPT-4 or Claude

Handles complex customer issues with nuance
Better at understanding emotional context
More sophisticated problem-solving capabilities
Superior at de-escalating difficult situations

Lead Qualification and Sales

Recommended: GPT-4

Excellent at reading between the lines
Strong persuasion and rapport-building abilities
Good at asking qualifying questions naturally
Effective at handling objections

High-Volume Appointment Setting

Recommended: GPT-3.5 Turbo

Fast response times keep conversations flowing
Efficient for straightforward scheduling tasks
Cost-effective for large call volumes
Adequate intelligence for routine interactions

Technical Support

Recommended: GPT-4 or Claude

Better at understanding technical concepts
More accurate troubleshooting guidance
Superior at breaking down complex solutions
Better context retention for multi-step processes

Performance vs. Cost Considerations

Premium Models (GPT-4, Claude)

Higher Cost: More expensive per conversation
Superior Quality: Better understanding and responses
Best For: High-value interactions, complex use cases
ROI Consideration: Higher conversion rates often justify increased costs

Efficient Models (GPT-3.5 Turbo)

Lower Cost: More conversations per dollar
Good Quality: Adequate for most routine interactions
Best For: High-volume, straightforward use cases
ROI Consideration: Cost savings enable higher call volumes

LLM Performance Characteristics

Response Time Comparison

GPT-3.5 Turbo: ~1-2 seconds (fastest)
GPT-4: ~2-4 seconds (moderate)
Claude: ~2-5 seconds (varies by complexity)

Context Window Sizes

Different LLMs can remember different amounts of conversation history:

GPT-4: Large context window - remembers entire conversations
GPT-3.5 Turbo: Moderate context - good for most interactions
Claude: Large context window - excellent memory for long conversations

Specialized Capabilities

GPT-4 Strengths

Complex reasoning and analysis
Creative problem-solving
Understanding subtle context and implications
Excellent at adapting communication style

GPT-3.5 Turbo Strengths

Fast, efficient responses
Good general knowledge
Reliable performance for routine tasks
Cost-effective scaling

Claude Strengths

Careful, thoughtful responses
Strong analytical capabilities
Excellent safety and appropriateness
Good at detailed explanations

Testing Different LLMs

A/B Testing Your LLM Choice

Set Baseline: Use your current LLM for a week and track metrics
Switch Models: Change to a different LLM for comparison
Monitor Performance: Track conversation success rates, customer satisfaction
Compare Results: Analyze which LLM performs better for your specific use case

Key Metrics to Track

Conversation Success Rate: How often does the agent achieve the desired outcome?
Customer Satisfaction: How do customers respond to different LLMs?
Response Appropriateness: How well does the LLM understand context and respond appropriately?
Efficiency: How quickly does the agent reach conversation goals?

Testing Tip: Many TriLuna users find that premium models (GPT-4) deliver higher conversion rates that more than offset the additional cost, especially for sales and high-value customer interactions.

LLM-Specific Configuration Tips

Optimizing System Prompts by LLM

For GPT-4:

GPT-4 handles complex, nuanced prompts very well. You can include detailed instructions about tone, personality, and specific behaviors. It excels with context-rich prompts that provide examples and edge case handling.

For GPT-3.5 Turbo:

Keep prompts clear and concise. Focus on specific, actionable instructions rather than nuanced personality descriptions. Works best with straightforward, goal-oriented prompts.

For Claude:

Claude responds well to structured, thoughtful prompts that emphasize helpfulness and accuracy. Include clear guidelines about when to be thorough vs. concise.

Troubleshooting LLM Issues

Common Issues and Solutions

Agent Responses Too Slow

Solution: Switch to GPT-3.5 Turbo for faster responses
Check: Ensure system prompts aren’t overly complex
Consider: Whether response quality vs. speed trade-off is acceptable

Agent Doesn’t Understand Context

Solution: Upgrade to GPT-4 or Claude for better comprehension
Check: System prompt clarity and examples
Adjust: Behavior settings to be more explicit

Responses Too Expensive

Solution: Switch to GPT-3.5 Turbo for cost efficiency
Optimize: System prompts to be more concise
Review: Whether premium model ROI justifies cost

Agent Responses Inappropriate

Solution: Switch to Claude for more conservative responses
Adjust: System prompt to include safety guidelines
Review: Behavior settings for appropriateness rules

Best Practices Summary

Start with GPT-4 for most use cases, then optimize based on performance
Use GPT-3.5 Turbo for high-volume, routine interactions
Choose Claude for conservative, analytical, or safety-critical applications
Test thoroughly before making permanent changes
Monitor performance metrics after LLM changes
Adjust prompts and behaviors to match your chosen LLM’s strengths
Consider cost vs. performance trade-offs for your specific use case

Need Help Choosing?

LLM selection can significantly impact your agent’s performance. Get expert guidance:

Email our AI optimization team
Schedule an LLM consultation through your dashboard
Use the chat widget for quick LLM questions
Request performance analysis of your current LLM choice