Nepali and English Multilingual Audio to Audio
- Designed and implemented a fully modular bilingual (English + Nepali) conversational AI pipeline, decoupling ASR, language modeling, RAG-based retrieval, translation, and text-to-speech into separately trainable and swappable components.
- Fine-tuned Conformer-based ASR models for both languages, achieving 6.14% WER on LibriSpeech (English) and 31.40% WER and 11.51% CER on the Nepali FLEURS test set, surpassing all Whisper variants while using fewer parameters.
- Built a custom decoder-only transformer language model (GPT-2 architecture) with a FAISS-backed Retrieval-Augmented Generation system trained on recent Nepali news, reducing hallucination and improving factual grounding (BERTScore-F1: 0.9303, ROUGE-L: 0.5862).
- Applied ONNX export and INT8 dynamic quantization to compress the Nepali ASR model from 493 MB to 134 MB (71.5% reduction) with negotiable performance change, enabling deployment on resource-constrained hardware.











