Optimizing AI voice agent performance with Preswald

Optimizing AI voice agent performance with Preswald

Amrutha GujjarAmrutha Gujjar3 min read

Category: Use Case


AI-powered voice assistants have been taking the world by storm – for so many unique use-cases. But these systems are complex—multiple components working together, each introducing potential points of failure:

  • Capturing spoken language and transcribing it into text.

  • Deciphering user intent and extracting key details.

  • Maintaining conversation context for fluid interactions.

  • Using rule-based, retrieval-based, and generative models to generate responses.

  • Converting AI-generated text into lifelike speech.

img

Real-world deployment === very real challenges:

  • ASR errors: Poor transcriptions lead to incorrect intent detection. Background noise, accents, and poor microphone quality can lead to incorrect transcriptions, which impact downstream processing.

  • Misclassified intents: The model misinterprets user input, assigns the wrong label, and takes the wrong action.

  • Slow response times: Latency at any stage (ASR, intent classification, or response generation) makes interactions sluggish.

  • Confidence misalignment: The AI responds even when it’s uncertain, leading to confusing or incorrect answers.

Debugging AI Voice Agents is a Pain

img

  • Logs are scattered across different systems, making it difficult to trace errors end-to-end.

    • Your logs are spread across Whisper, Deepgram, etc for transcriptions and confidence scores, OpenAI function calling, Rasa, Haystack, etc for intent classification and entity extraction, and response logs LangChain, custom LLM pipelines, etc capturing model outputs, reasoning traces, and latency. Orchestration logs, like Vocode, Voiceflow, LangServe track API calls, tool usage, and error handling, while TTS logs ElevenLabs, PlayHT, AWS Polly monitor speech synthesis. Session and user interaction logs often live in vector databases (Weaviate, Pinecone) or structured storage (PostgreSQL, ClickHouse) to analyze conversation flow, fallbacks, and user corrections.
  • Cloud databases charge per query and data scan, making debugging expensive—especially at scale.

  • Issues surface too late, by which point they’ve already impacted the user experience.

  • AI failures often involve multiple components, but cloud logs separate them, making it harder to connect the dots.

  • If you’re working on an edge device or in a secure environment, relying on cloud logs isn’t always an option.

The Solution: Debug AI Locally with DuckDB & Preswald

Debugging AI failures can be slow and expensive. But with DuckDB and Preswald, you can:

  • Run SQL queries on local logs instantly, even on large datasets.

  • spreadsheet-based debuggingReplace ad-hoc data analysis from spreadsheets or pandas scripts with interactive, shareable dashboards

  • Build real-time, persistent dashboards so teams can explore logs dynamically.

Setting Up Your Local AI Debugging Environment

Create voice_dashboard.py:

1. Install Dependencies

pip install duckdb pandas preswald

2. Load and Analyze AI Logs with DuckDB

import duckdb  
import pandas as pd  
import json  

# Load ASR logs
with open("asr_logs.json", "r") as f:
    asr_data = pd.DataFrame(json.load(f))

# Load NLP intent logs
intent_data = pd.read_csv("intent_logs.csv")

# Load AI response logs
response_data = pd.read_csv("response_logs.csv")

# Create DuckDB connection
con = duckdb.connect()
con.register("asr_logs", asr_data)
con.register("intent_logs", intent_data)
con.register("response_logs", response_data)

3. Join Logs and Identify Issues

SELECT 
    a.call_id, 
    a.transcribed_text, 
    a.asr_confidence, 
    i.user_said, 
    i.predicted_intent, 
    i.confidence AS intent_confidence, 
    r.ai_response, 
    r.response_time
FROM asr_logs a
JOIN intent_logs i ON a.call_id = i.call_id
JOIN response_logs r ON a.call_id = r.call_id;

4. Save Processed Data for Preswald

processed_logs = con.execute("""
    SELECT * FROM joined_logs
""").fetchdf()

processed_logs.to_csv("voice_agent_logs.csv", index=False)

Building an AI Debugging Dashboard in Preswald

Within voice_dashboard.py, add:

1. Create an Interactive Log Viewer

from preswald import text, table  

text("# 🗣️ AI Voice Debugging Dashboard")

table(asr_data)

Run:

preswald run voice_dashboard.py

2. Add Intent Filtering & Heatmap

text("# 🔍 AI Intent Debugging")

misclassifications = data_conn.query("""
    SELECT actual_intent, predicted_intent, COUNT(*) as error_count 
    FROM logs WHERE actual_intent != predicted_intent 
    GROUP BY actual_intent, predicted_intent
""")

heatmap(misclassifications, x="actual_intent", y="predicted_intent", value="error_count", title="Misclassification Patterns")

What actions can you take based on the data?

Once you have AI logs available locally, the real value comes from using them to drive action. Debugging is more than finding failures. It’s about fixing them before they impact users.

  • If certain words/accents cause frequent transcription mistakes, retrain the model with more diverse data or adjust confidence thresholds.

  • If misclassifications are common, refine training data, adjust entity extraction, or fine-tune confidence thresholds.

  • If response time is an issue, optimize database queries, simplify models, or introduce caching.

  • Detect user frustration early – If users repeat themselves or correct the AI often, identify unclear responses and improve the model.

Takeaways

img

Quickly identify what the most common issues are — at-a-glance

With DuckDB, you can run instant queries on large log files without waiting for cloud ingestion. Preswald turns messy CSVs and ad-hoc scripts into interactive dashboards.

🔗 Check out the full code sample in the examples/ folder of the Preswald GitHub repository.