All Events
Multilingual Voice AI roundtable at EkStep Foundation with 24 Voice AI builders and experts
Voice AIMultilingualSpeech-to-TextText-to-SpeechTelephonyPublic Services

Multilingual Voice AI for National Helplines — EkStep Foundation Roundtable

9 March 2026
EkStep Foundation, Bangalore, Karnataka, India
Technology

Kenpath's Chief Data Scientist Aditya Chhabra and Anjali Rao, Data Scientist specialising in Voice AI, joined a focused roundtable convened by EkStep Foundation — 12 participants in the room at EkStep, 12 joining virtually — bringing together model builders, platform teams, telephony players, and solution integrators.

The question on the table was not whether to build multilingual Voice AI for national helplines, but how — where the components live, how the stack holds up at population scale, and who owns what.

Key conclusions from the session:

1. Language detection needs human input, not just technology
Telecom circle metadata is imprecise. Automated LID struggles with utterances under 1.5 seconds — common on phone calls. The simplest and most reliable solution: ask the caller. A well-designed greeting where the caller picks their language beats any smart detection system.

2. Start the call in two or three languages at once
Greet in the most likely languages simultaneously, run LID on the first response, and lock in the language from there. Mid-conversation full language switches are far less common than feared once the opening is handled well.

3. Keep language detection close to speech recognition
LID should run inside or alongside the ASR model — not as a separate upstream microservice. The orchestration layer owns the language context state across turns and the final language-switching decision; it is the system's central nervous system.

4. Avoid real-time translation if possible
Adding a translation step on every exchange adds latency and degrades naturalness. The better path: train AI models directly in the target language. For low-resource and tribal languages, the direction is to fine-tune LLMs with well-trained NMT models as a bridge.

5. The problem is solvable — the real work is beginning
The architecture is coming into shape, models are maturing, and the field experience in the room is real. The harder work ahead is proof of concepts, fine-tuning, and ecosystem coordination to make it production-ready across all Indian languages — including ones that have never been in a model.