Publications | Deepak Babu Piskala

Google Scholar Publications

Synced with the current Google Scholar profile, with public paper and patent links where available.

Why Clicking Buttons Is Harder Than Calling APIs: A Comparative Study of Agent Control Surfaces (IJERT, 2026)
Compares UI-grounded and API-grounded agent control surfaces across reliability, robustness, security, latency, auditability, coverage, cost, and human-in-the-loop design.
The Unification Imperative: The Shift to Unified End-to-End AI Systems Insights from Speech Recognition, Voice Assistants, and Self-Driving (TechRxiv, 2026)
Frames modern AI design around a task-system unification axis, showing why many production systems combine task-level end-to-end learning with system-level modular control.
Agent, Sub-Agent, Skill, or Tool? A Practitioner's Guide to Extending Agentic AI Systems (TechRxiv, 2026)
Defines tools, skills, sub-agents, and agents by control characteristics, with practical orchestration guidance for production agentic systems.
MAPLE: A Sub-Agent Architecture for Memory, Learning, and Personalization in Agentic AI Systems (arXiv:2602.13258, 2026)
Separates memory, learning, and personalization into explicit sub-agent responsibilities to improve adaptation and controllability in long-horizon agent systems.
Spec-Driven Development: From Code to Contract in the Age of AI Coding Assistants (arXiv:2602.00180, 2026)
Argues for treating specifications and contracts as the primary artifact while implementation is generated, checked, and refined against formal intent.
Enhanced automatic speech recognition to avoid misrecognition of voice utterances (US Patent 12,525,220, 2026)
Describes a post-ASR correction model that flags likely misrecognitions, selects a corrected interpretation, and routes it to response generation.
Where Should Intelligence Live? Agent-Centric vs. Environment-Centric Design in Agentic Systems (TechRxiv, 2026)
Provides a design framework for choosing between agent-centric and environment-centric intelligence, with emphasis on reliability and structured task settings.
Beyond Words: Toward Audio-First Foundation Models for Effortless Human-Computer Interaction (SSRN, 2026)
Makes the case for speech-native architectures and audio-first data distributions as voice interaction becomes habitual rather than exceptional.
From “Everything is a File” to “Files Are All You Need”: How Unix Philosophy Informs the Design of Agentic AI Systems (Engineering Archive, 2026)
Connects Unix's file abstraction to agentic AI, proposing file/code-centric interfaces for composability, auditability, and robust tool use.
The AI Roles Continuum: Blurring the Boundary Between Research and Engineering (arXiv:2601.06087, 2025)
Synthesizes hiring and organizational signals to argue that AI research and engineering roles increasingly overlap along a practical capability continuum.
PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech (arXiv:2512.23686, 2025)
Introduces a professional-speech ASR benchmark focused on entity-critical errors across finance, medicine, legal, and technical domains.
Large language model (LLM)-based correction based on a multi-tool prompt (US Patent 12,444,412, 2025)
Covers an LLM-guided correction workflow where a prompt encodes a multi-step tool plan and uses tool outputs to update user-provided data.
Mind the Goal: Data-Efficient Goal-Oriented Evaluation of Conversational Agents and Chatbots using Teacher Models (arXiv:2510.03696, 2025)
Evaluates multi-turn agents by whether the user's goal is achieved, with teacher models and a taxonomy of failure causes.
Dynamic LLM Routing and Selection Based on User Preferences: Balancing Performance, Cost, and Ethics (arXiv:2502.16696, 2025)
Introduces a routing engine that selects models per request using user-defined tradeoffs across accuracy, latency, cost, and ethics.
Selfplay: A Python Framework for Role-Based Dialogue Simulation (Zenodo, 2024)
Python framework for simulating role-based dialogues and generating structured conversational data.
Automated ASR Transcriptions LLM Agent Learning to Imitate Humans (2023)
Explores LLM-agent behavior for improving automated ASR transcription workflows by imitating human correction patterns.
Utterance generation and evaluation (US Patent 11,600,260, 2023)
Covers techniques for generating candidate utterances from intent and target data, then evaluating them for downstream voice/NLU use.
WEB2VOICE - Bridging the Gap Between Voice Shopping and Web Shopping to Improve ASR (AMLC, 2022)
Amazon Machine Learning Conference work on connecting web-shopping context with voice-shopping ASR improvement.
Retrieve-Rank: ASR Error Correction for Long-tail Domains (AMLC, 2022)
Amazon Machine Learning Conference work on retrieval-and-ranking based ASR correction for long-tail domains.
Identifying Product Affinity in Webpages for Contextual Advertising (AMLC, 2018)
Amazon Machine Learning Conference work on identifying product affinity signals in webpages for contextual advertising.