Slicing open the avocado…
Slicing open the avocado…
Slicing open the avocado…
Software Engineer · Qualcomm Edge AI Hackathon Winner · Agentic AI & Distributed Infra at Scale
Prev @ NYU IT High-Speed Research Network, Shell, Wipro
I build AI systems that work under real production pressure. At NYU, I cut RAG query latency by 78% on a Multi-Agent research engine serving 3K+ RPS, and pushed LLM inference to 15ms on Snapdragon NPUs via QLoRA + AWQ quantization. Before that, I kept Shell's maritime telemetry alive 115GB/day, 200+ offshore stations, zero data loss.
PreviouslyEnergy·Bioinformatics·Scientific Research
Excited inHealth Care·Finance·Consumer Space (Entertainment and Retail)·& more
Currently, I'm obsessed with recommendation systems and the search technology at scale, where it powers the way humans think and behave with intention and responsibility.
Production AI portfolio with Avocado, a streaming RAG chatbot backed by a 4-stage hybrid retrieval pipeline: query expansion (up to 4 variants), batched dense search via ChromaDB (all-MiniLM-L6-v2 ONNX), BM25 lexical search (rank_bm25), and Reciprocal Rank Fusion (k=60) — all before Gemini 2.5 Flash sees the question. Knowledge base of ~80 atomic documents auto-syncs incrementally: new or edited documents are upserted, deleted documents are purged from ChromaDB, unchanged ones are skipped entirely. Runs as a FastAPI Docker container on AWS Lightsail with zero-downtime blue-green deployments, Nginx + Let's Encrypt HTTPS, daily S3 backup of SQLite analytics, and a Next.js 16 static frontend on GitHub Pages — total cost $10/month.
Achieved 15ms token latency on Snapdragon NPUs — a 10× improvement over cloud inference — by fine-tuning Llama 3.2 3B on security logs with QLoRA and deploying via 4-bit AWQ quantization through ONNX Runtime on-device. Guaranteed zero data loss during network partitions via an offline-first SQLite buffer with background sync workers.
Production LangGraph + Llama 3.1 70B system that semantically maps global researcher collaboration networks by indexing millions of papers from Elsevier's Science Direct/Scopus. Cut P99 RAG latency by 78% with Write-Through Redis caching; sustained 99.9% uptime at 3,000+ RPS on AWS ECS.
Conflict-free simultaneous multi-user editing using Yjs (CRDTs) and WebSockets, scaled horizontally via Nginx load balancing across containerized instances. Increased AI auto-complete context quality by 65% with a Context-Aware Coding Agent using AST-based chunking and Voyage-Code-2 embeddings.
Languages
AI & Agents
Systems & Cloud
Frameworks & Databases
Certifications
Sabarish is an intuitive and intelligent person, who has offered me solutions in the most difficult times and helped me at multiple occasions. He is prompt, hardworking with a knack for technology, and an absolute delight to work with!
Dec 2022
Open to software engineering roles in AI/ML infrastructure, distributed systems, and full-stack. Healthcare, frontier research, and energy tech are my focus — but excited by any hard problem.