Available 2026 AI/ML Engineer Mumbai, India

I build AI that earns human trust.

I get genuinely excited about AI that closes real gaps — not demos, not dashboards. Real systems. Real humans. Real stakes. Six shipped projects. One thread: building things people actually trust.

See Featured Work Chat with My AI Assess Fit 📄 Download Resume

30-Second Brief

K Sai Sovit

AI / ML Engineer · B.Tech AI

LocationMumbai, India · Remote OK

Current/LastDSJ Keep Learning — AI/ML Eng. (May 2025–Sept 2025)

Core StackLangGraph · LangChain · RAG · PyTorch · DenseNet · MTCNN · Transformers · WhisperX · Redis

AvailableOpen — AI/ML, RAG & Agent consulting

Contactksaisovit@gmail.com

// Three things I'm proud of

96.7%Pneumonia detection sensitivity — clinical-grade, trusted by radiologists

3×RAG systems in production — WhatsApp/Email/SMS lead agent, financial analyzer, query bot

+63%Radiologist acceptance increased through Grad-CAM explainability

Say hello → ksaisovit@gmail.com

96.7% Sensitivity 97.1% AUC Score +63% Radiologist Trust LangGraph Multi-Agent in Prod. 3× RAG Deployed 8 Algorithms Benchmarked 92% Attendance Accuracy −18% False Positives 96.7% Sensitivity 97.1% AUC Score +63% Radiologist Trust LangGraph Multi-Agent in Prod. 3× RAG Deployed 8 Algorithms Benchmarked 92% Attendance Accuracy −18% False Positives

Deep Dive · Case Study 01

Pneumonia
Detection
for 4.3B People

The Problem

4.3 billion people have no access to a radiologist. Pneumonia kills 2 million annually — most preventably. Existing AI models hit 94% accuracy but radiologists wouldn't use them. The missing piece wasn't accuracy. It was trust.

The key insight was that clinicians don't need a more accurate model — they need a model they can explain to a patient. That led to Grad-CAM visualizations and Bayesian uncertainty quantification. Neither is novel; the architectural decision to combine them for clinical acceptance was.

Sensitivity

AUC Score

Radiologist
Acceptance %

DenseNet-121 PyTorch Grad-CAM Bayesian Dropout Albumentations ×12

// The full story

Situation

4.3B people lack radiology access. Radiologist error rate up to 30% under fatigue. Existing AI models were rejected by clinicians — not because of low accuracy but because "black box" decisions undermine clinical confidence and liability. A 94% accurate model that doctors won't use is worth 0%.

Approach

DenseNet-121 backbone for X-ray feature extraction — chosen over ResNet for dense skip connections that preserve spatial gradients critical for localization.

Bayesian Gaussian Dropout instead of standard dropout — enables uncertainty quantification at inference time. The model returns not just a prediction but a confidence interval, essential for clinical safety.

Grad-CAM overlays generate heatmaps on the lung region, making the model's attention visible. Clinicians could point to the same region the model flagged — building trust through shared visual language.

12 Albumentations augmentations — including elastic distortions, CLAHE, and random gamma — to simulate the full range of X-ray quality and patient positioning variance.

Result

96.7% sensitivity · 92.3% specificity · 97.1% AUC · −18% false positives · +63% radiologist acceptance in evaluation trials.

// What I carry forward

"Uncertainty quantification beats raw accuracy for clinical AI adoption. A model that knows what it doesn't know is more trustworthy than one that's always confident."

Deep Dive

Internship Projects
Architectural Decisions★

Lead Scoring Pipeline

Asynchronous Exponential Decay Redis Queues CRM Integration

Situation

Manual lead scoring was slow, inconsistent, and didn't account for engagement decay over time. Sales teams couldn't prioritize effectively.

Approach

Built an asynchronous lead scoring system using Python and Redis queues. The model incorporates communication patterns, engagement metrics, and behavioral analysis. Exponential decay algorithm weights recent interactions more heavily, ensuring scores reflect current interest. Integrated into the existing CRM for real‑time updates.

Result

Automated lead prioritization, reduced manual effort by 70%, and improved sales focus on high‑intent leads. The exponential decay captured recency better than static scoring.

Key Lesson

Timeliness matters more than total engagement. A lead that interacted yesterday is far more valuable than one from three months ago, even with the same total activity.

Call Scoring Pipeline

WhisperX pyannote.audio Redis Job Queue Rubric Scoring

Situation

Call recordings were reviewed manually for quality assurance — time‑consuming and subjective. No standardized way to measure agent performance or compliance.

Approach

Built a real‑time audio pipeline: WhisperX for high‑accuracy transcription, pyannote.audio for speaker diarization (separating agent from customer). A distributed Redis job queue handles asynchronous processing, scaling across multiple calls. A custom rubric‑based scoring engine evaluates call transcripts against predefined criteria (e.g., greeting, problem understanding, closing).

Result

100% automated call scoring, consistent across all agents. Feedback turnaround dropped from days to minutes. Redis queue allowed horizontal scaling during peak call volumes.

Key Lesson

Speaker diarization is the hardest but most critical part. Without knowing who said what, scoring is meaningless. WhisperX + pyannote together gave the best accuracy vs. latency trade‑off.

Query Chatbot

RAG LangChain OpenAI GPT‑4 PostgreSQL SQL Injection Prevention

Situation

Ed‑Tech CRM had unstructured data across documents, support tickets, and internal knowledge bases. Support teams spent hours answering repetitive queries. Direct SQL access posed security risks.

Approach

Developed a RAG‑powered chatbot using LangChain and OpenAI GPT‑4 API. Retrieval from vectorized internal docs and a PostgreSQL database. Built a secure SQL query validator that parses and sanitizes any generated SQL before execution, preventing injection attacks. The system can answer both document‑based questions and pull structured data from the CRM.

Result

Reduced support load by 40%, response times from hours to seconds. The SQL validator blocked all injection attempts in testing, making the system safe for production.

Key Lesson

Generative SQL is dangerous without a strict validator. We implemented a whitelist‑based query parser that only allows SELECT statements on specific tables — no ALTER, DROP, or JOINs beyond approved ones.

Selected Work

Six Problems
I Got Obsessed With02

End-to-EndMost In-Demand '26Multi-AgentRAG

Lead Nurturing Agent

LangGraph · Production · DSJ Keep Learning

3 specialist agents orchestrated via LangGraph state graph. The architecture enables non-linear conversation branching across WhatsApp, Email, and SMS — impossible with a standard chain. RAG via SentenceTransformers + Qdrant for context retrieval.

SituationManual lead qualification across 3 channels — inconsistent, slow, couldn't scale

ApproachLangGraph 3-agent (Qualification / Technical / Pricing) + RAG (SentenceTransformers + Qdrant)

ResultFully automated qualification; real-time personalized responses across all 3 channels

LessonGraph state enables non-linear conversation flows that chains fundamentally cannot handle

LangGraphQdrantSentenceTransformersPython

Clinical GradeComputer Vision

Pneumonia Classification

96.7% Sensitivity · 97.1% AUC · +63% Trust

Full case study above ↑ — DenseNet-121 + Bayesian Dropout + Grad-CAM. The explainability layer drove clinical adoption more than raw accuracy.

Key insightClinicians need to explain decisions to patients — a black-box 94% model fails this. Explainability + uncertainty quantification were the real product.

DenseNet-121Grad-CAMBayesian DropoutPyTorch

Prod. RAG

LLM Financial Analyzer

Hours → Minutes

RAG on PDF annual reports. Domain-specific chunking strategy for financial structure. LangChain + Pinecone + OpenAI.

SituationAnalysts spending hours per PDF report manually extracting structured insights

LessonRAG + proper chunking beats fine-tuning for doc QA — faster to deploy, easier to update

LangChainPineconeOpenAI

Optimization

VRP — 8 Algo Benchmark

Hybrid IP+GA Optimal

GA, SA, ACO, Dijkstra + 4 hybrids. Systematic empirical benchmarking vs theoretical preference.

LessonNo single algorithm dominates all metrics. Empirical benchmarking over theoretical preference — the methodology is the contribution.

PythonDEAPOR-Tools

NLP Research

Sanskrit Transformer

Built From Scratch

Transformer from primitives + custom word2vec on scraped corpus. Low-resource NLP is a data strategy problem.

LessonThe challenge wasn't the model — it was corpus strategy. Low-resource NLP is fundamentally a data acquisition problem.

PyTorchWord2VecBeautifulSoup

End-to-EndCV · Deployed

Automated Attendance System

92% Accuracy · Live

MTCNN + FaceNet + SVM + geolocation + Streamlit. From camera input to real-time attendance log with fraud prevention. The philosophy: a deployed 92% creates more value than a 99% notebook.

MTCNNFaceNetSVMStreamlitOpenCV

SituationProxy fraud and manual attendance with no geographic verification

ApproachMTCNN face detection + FaceNet embeddings + SVM classifier + geolocation gate + Streamlit UI

Result92% accuracy on real dataset. Full pipeline camera → attendance log, fraud-resistant

LessonShipping is the final feature. A deployed 92% system beats a 99% notebook that never runs in production.

Recent Internship Projects

Shipped at DSJ Keep Learning
Production Systems★

Lead Scoring Pipeline

Asynchronous · Exponential Decay

Developed an asynchronous lead scoring model that incorporates communication patterns, engagement metrics, and behavioral analysis with exponential decay algorithms for sales optimization. Integrated into the existing CRM for real‑time scoring.

Python Redis FastAPI Scikit‑learn

Call Scoring Pipeline

Real‑time Audio · Speaker Diarization

Implemented real‑time audio transcription using WhisperX with pyannote.audio for speaker diarization. Built a distributed Redis job queue for asynchronous processing and integrated a custom rubric‑based scoring system for agent evaluation.

WhisperX pyannote.audio Redis Python

Query Chatbot

RAG · PostgreSQL · SQL Injection Prevention

Developed a RAG‑powered chatbot for an Ed‑Tech CRM using LangChain and OpenAI GPT‑4 API. Designed and implemented a secure PostgreSQL query validator to prevent SQL injection and ensure data integrity.

LangChain OpenAI PostgreSQL RAG

Who I Am

Hi — I'm Sai.
Let me explain.03

Somewhere between a clinical AI project and a room full of sixth-graders learning maths, I figured out what actually drives me: closing the gap between people who have access to good tools and people who don't. That's not a mission statement — it's the thread connecting everything I've built.

"A deployed 92% system creates more value than a 99% notebook. Shipping is the final feature."

My pneumonia project started as a classification task and ended as a lesson in human trust. The +63% radiologist acceptance didn't come from better accuracy — it came from making the model's reasoning visible through Grad-CAM. Clinicians could point to the exact lung region the model flagged. Shared visual language. Suddenly it wasn't "the AI says so" — it was a conversation between doctor and tool.

I volunteer-teach 6th and 7th graders at the We Can We Will Foundation — English, maths, the basics. The access problem I try to solve with code, I also try to solve in person. B.Tech AI at VijayBhoomi University, leading the Alterlights Project, and spending way too much time debating architecture decisions with people who care about them as much as I do.

I'm looking for a team that ships things, debates the hard tradeoffs honestly, and still asks “but does it actually help someone?” after the model hits production.

Sai

K Sai Sovit

AI / ML Engineer · VijayBhoomi University

Emailksaisovit@gmail.com

Phone+91 93371 99404

GitHubksaisovit

LinkedInksaisovit

StatusOpen to hire · AI/ML and Agent roles

// How I think about problems

Trust over accuracy — a model people actually use beats a better one they won't.

Architecture before tools — LangGraph vs LangChain is a state management decision, not a preference.

The question matters as much as the answer — eight VRP benchmarks taught more than one optimal result.

Shipping is the final feature — but only if it helps someone when it lands.

Timeline

Where I've
Learned Most04

May 2025 — Present

DSJ Keep Learning

AI / ML Engineer

📜 View Certificate

Lead Nurturing Agent — LangGraph 3-agent + RAG. Replaced manual triage across WhatsApp, Email, SMS.
Lead Scoring Pipeline — Asynchronous scoring with exponential decay and behavioral analysis, integrated into CRM.
Call Scoring Pipeline — WhisperX transcription + pyannote diarization, Redis queue, rubric‑based scoring for agent evaluation.
Query Chatbot — RAG + PostgreSQL query validator for Ed‑Tech CRM, preventing SQL injection.

2021 — Present

VijayBhoomi University

B.Tech AI · Research

Pneumonia Classification — DenseNet-121 + Bayesian Dropout + Grad-CAM. 96.7% sensitivity, +63% clinical acceptance.
LLM Financial Analyzer — LangChain + Pinecone + OpenAI. Reduced research time from hours to minutes.
VRP 8-Algorithm Benchmark — Hybrid IP+GA identified as optimal.
Sanskrit Transformer — Built from scratch. Custom word2vec vs StanfordNLP + IndicNLP.
Alterlights Project Leader · Leetcode Group Member.

Ongoing

We Can We Will

Volunteer Teacher

English & Mathematics for 6th/7th grade students. Food drives and community outreach. The access problem at scale.

Capabilities

Skills with
Evidence

// Agentic AI Systems

LangGraphProduction

Built 3-agent lead nurturing system with shared state graph — qualification, technical, and pricing specialists. Architectural choice: LangGraph over LangChain because conversation state across 3 channels requires a graph, not a chain.

RAG Pipelines3× Built

Three production RAG systems: Qdrant + SentenceTransformers for conversational context, Pinecone + OpenAI for financial docs. Chunking strategy tuned per domain.

LangChainProficient

Used for linear document QA where LangGraph overhead wasn't warranted. Trade-off reasoning, not default.

// Computer Vision

DenseNet-121 + BayesianValidated

96.7% sensitivity, 97.1% AUC. Bayesian Gaussian Dropout adds uncertainty quantification — essential for clinical trust beyond raw accuracy.

Grad-CAM ExplainabilityClinical

Drove +63% radiologist acceptance. Explainability was the product — accuracy alone couldn't drive adoption.

MTCNN + FaceNetDeployed

Face detection + embedding pipeline for Attendance. SVM on embeddings + geolocation. 92% accuracy, live.

// NLP & LLM

Transformer ArchitectureResearch

Built from scratch for Sanskrit NLP — not fine-tuning. Architecture from primitives. Custom word2vec on web-scraped corpus.

OpenAI APIProduction

Production integration for Financial Analyzer with domain-specific system prompts and hallucination thresholds.

WhisperX + pyannoteReal‑time Audio

Built call scoring pipeline with speaker diarization and custom rubric‑based scoring. Redis for async job queue.

// ML Systems

Optimization AlgorithmsBenchmarked

8 algorithms for VRP with systematic methodology across runtime, quality, scalability. Hybrid IP+GA identified as optimal trade-off.

PyTorch / TensorFlowCore

Custom training loops, loss functions, gradient manipulation. Not just high-level APIs — comfortable at the primitive layer.

// Deployment & MLOps

Streamlit / GradioDeployed

Deployed Attendance and Pneumonia demos. Philosophy: deployed 92% beats 99% notebook.

Vector DatabasesProficient

Qdrant for conversational retrieval, Pinecone for document search. Selected per use-case — not one-size-fits-all.

RedisAsync Queues

Used in call scoring pipeline for job distribution and caching. Reliable and fast.

Talk to the work

Two AI Tools
Built for You05

Both tools run on Claude and are trained on my full project data — specific metrics, architectural decisions, and SARL case studies. The chat agent cites specifics, not summaries. The Fit Tool gives an honest match score with direct evidence from my work.

Ask Sai's Agent

Trained on all 6 projects + internship systems · cites specific metrics and decisions

I'm an AI trained on Sai's project data. Ask about specific engineering decisions, trade-offs, or why he chose one approach over another. I'll cite real numbers.

Try asking:

FIT

Role Fit Assessment

Paste a JD → get honest match % with specific evidence

Job Description

Paste a JD above and analyze. You'll get an honest score with evidence from my specific projects.

Analyzing…

I build AI that earns human trust.

PneumoniaDetectionfor 4.3B People

Internship ProjectsArchitectural Decisions★

Six ProblemsI Got Obsessed With02

Shipped at DSJ Keep LearningProduction Systems★

Hi — I'm Sai.Let me explain.03

Where I'veLearned Most04

Skills withEvidence

Two AI ToolsBuilt for You05

Education06

Pneumonia
Detection
for 4.3B People

Internship Projects
Architectural Decisions★

Six Problems
I Got Obsessed With02

Shipped at DSJ Keep Learning
Production Systems★

Hi — I'm Sai.
Let me explain.03

Where I've
Learned Most04

Skills with
Evidence

Two AI Tools
Built for You05