Open to new opportunities

Real-time pipelines.
Distributed systems.
ML at scale.

Sai Thanmai — AI Engineer · Software Engineer · ML Engineer

I process millions of records, build streaming infrastructure, and design AI-driven analytics systems that turn raw events into production intelligence. Currently doing that at Pycube Inc.

View Projects → Download Resume GitHub LinkedIn

Projects Built

22M+

Records Processed

$10M+

Revenue Impact

15+

Technologies

About

I'm an engineer who's happiest when data is moving fast and systems need to be reliable. My work sits at the intersection of AI Engineering, distributed systems, and machine learning — building the infrastructure that turns raw events into real-time decisions.

What drives me is the end-to-end problem: not just training a model, but getting it into production, keeping it honest, and making it useful at scale. I care about clean pipelines, observable systems, and code that other engineers actually want to maintain.

What I'm focused on

Streaming architectures, real-time anomaly detection, container orchestration, and building ML systems that hold up under production load.

What I value

Systems thinking over siloed models. Measurable impact over impressive demos. Clean abstractions and honest engineering tradeoffs.

What I'm looking for

Roles where I can own end-to-end systems — from data ingestion through ML inference to the dashboard someone makes a decision from.

Professional Experience

Where I've worked.

From enterprise healthcare AI to telecom-scale ML — building systems that drive measurable business outcomes.

Senior Data Scientist

Pycube, Inc · Virginia, USA

Jan 2023 — Present

Engineered enterprise RAG pipeline with LangChain, FAISS, Pinecone, and AWS SageMaker for document Q&A across 22M+ medical documents, achieving 92% retrieval accuracy.
Architected Generative AI evaluation framework assessing hallucination, relevance, and coherence across GPT, Mistral, and Claude — reducing model failure rates by 30% across 8+ AI projects.
Deployed AutoML annotation infrastructure on AWS Lambda with active learning loops — reducing inter-rater error by 25% and accelerating model release cycles from 14 to 4 days.

Data Scientist

ECIL · Hyderabad, India

June 2020 — Jan 2022

Spearheaded ensemble ML models (XGBoost, Random Forest, LSTM) on 4M+ telecom subscriber sequences achieving 0.91 AUC, improving churn prediction precision by 18% and enabling $10M+ retention revenue.
Deployed Deep Learning architectures (CNNs, LSTMs) for real-time customer intent classification from 500K+ monthly interactions, reducing misrouting by 22%.
Engineered NLP feedback pipeline with BERT and Hugging Face to extract insights from 500K+ unstructured responses, reducing manual analysis time by 60%.

Tech Stack

What I work with.

Deep expertise across AI Engineering, ML/AI, and cloud infrastructure.

Languages

PythonSQLGo BashJavaC/C++

Data & Streaming

Apache KafkaTimescaleDBPostgreSQL RedisMongoDBSnowflake

ML / AI

PyTorchTensorFlowscikit-learn LLMsRAGLangChain FAISSHugging FaceBERT

Infrastructure

DockerKubernetesAWS SageMaker LambdaEC2FastAPI WebSocketGit

Featured Work

Engineering at scale.

End-to-end systems built from scratch — production-grade architecture, real-time pipelines, and ML-powered intelligence.

Project 01

Real-Time Infrastructure Monitoring
& Analytics Platform

Live

A full observability platform that simulates infrastructure events, streams them through Kafka, detects anomalies with ML, and displays everything on a live dashboard — updated every 2 seconds via WebSocket.

Streaming pipeline: 12 services × 1 event/sec through Apache Kafka
TimescaleDB hypertables with continuous 1-min aggregates
4-strategy anomaly detection: threshold, z-score, rate-of-change, Isolation Forest (ML)
FastAPI backend with WebSocket real-time push to live dashboard
Dockerized — entire platform boots with one command

System Flow

Architecture

Event Generators

12 simulated services producing CPU, memory, disk, and network metrics at 1 event/sec each.

Kafka Pipeline

Apache Kafka ingestion layer handling high-throughput event streaming with topic partitioning.

ML Anomaly Engine

4-strategy detection (threshold, z-score, rate-of-change, Isolation Forest) with configurable alerting.

Live Dashboard

WebSocket-powered real-time UI with Chart.js visualizations updating every 2 seconds.

Tech Stack

Python Apache Kafka TimescaleDB FastAPI WebSocket scikit-learn Docker Compose Chart.js

View on GitHub

Project 02

Mini Container
Orchestration Simulator

Built

A ground-up simulator of Kubernetes-style container scheduling and orchestration. Implements core concepts: pod scheduling, node affinity, resource allocation, and health-based eviction — without requiring a full K8s cluster.

Custom scheduler with bin-packing and resource-aware placement
Simulated node pool with CPU/memory capacity tracking
Pod lifecycle management: pending → running → terminated
Health checks, restart policies, and eviction logic

Core Components

Scheduler

Bin-packing and resource-aware placement algorithms assigning pods to nodes by capacity.

Node Pool

Simulated cluster with per-node CPU/memory tracking, availability states, and drain support.

Health Monitor

Periodic health checks with automatic restart and rescheduling of failed containers.

Lifecycle Engine

Full pod lifecycle from pending through running to terminated with eviction policies.

Tech Stack

Python Docker Kubernetes Concepts Scheduling Algorithms Orchestration

View on GitHub

Project 03

Hospital Asset
AI Email Agent

Live

An intelligent agentic system that reads hospital staff emails, queries asset databases via MCP servers, searches SOPs and user manuals through a RAG pipeline, and replies with accurate, human-like answers — fully automated, end-to-end.

LangGraph stateful async workflow with conditional branching and parallel tool execution
3 dedicated MCP servers: MySQL (natural language to SQL), ChromaDB (semantic search), Outlook (Graph API)
RAG pipeline: pdfplumber + GPT-4o Vision OCR for scanned manuals, chunked into ChromaDB
AI-generated SOPs — Claude writes 9-section device SOPs from DB inventory, auto-indexed into vector store
Validation gate before every reply: quality check, no raw SQL/JSON leakage, human fallback on failure

System Flow

Architecture

LangGraph Workflow

Stateful async graph with conditional routing, parallel MCP tool calls, and a validation gate before every reply.

MCP Servers

3 independent servers (MySQL, ChromaDB, Outlook) — agent declares what it needs, never touches infra directly.

RAG Pipeline

PDFs ingested via pdfplumber + GPT-4o OCR, chunked and stored in ChromaDB. AI-generated SOPs auto-indexed per device type.

Safety Layer

SQL read-only enforcement, whitelist-only email processing, quality validation, and human-safe fallback on every failure path.

Tech Stack

Python LangGraph Claude (Haiku) MCP ChromaDB MySQL / AWS RDS Microsoft Graph API OpenAI Embeddings Flask

View on GitHub

More Projects

RAG Assistant

Retrieval-Augmented Generation system with vector indexing, semantic search, and LLM-powered Q&A over custom document corpora.

PythonLLMsFAISSEmbeddingsRAG

Distributed Rate Limiter

High-throughput distributed rate limiter using token bucket and sliding window algorithms for multi-node API gateway deployments.

PythonRedisToken BucketDistributed Systems

Predictive Analytics Pipeline

End-to-end pipeline ingesting live infrastructure metrics, applying forecasting models, and surfacing insights for proactive capacity planning.

PythonKafkaML ForecastingFastAPI

Education

Academic background.

Master of Science in Data Science

University of Maryland, Baltimore County

Aug 2022 — Dec 2023

B.Tech in Electronics & Communication Engineering

SCSVMV University, Kanchipuram, India

July 2018 — June 2022

Get In Touch

Let's connect.

Open to AI Engineering, Software Engineering, and ML Engineering roles. Feel free to reach out — always happy to chat about systems, data, or interesting problems.

thanmai.sp@gmail.com LinkedIn GitHub

Virginia, USA

Real-time pipelines.Distributed systems.ML at scale.

What I'm focused on

What I value

What I'm looking for

Where I've worked.

What I work with.

Languages

Data & Streaming

ML / AI

Infrastructure

Engineering at scale.

Real-Time Infrastructure Monitoring& Analytics Platform

Event Generators

Kafka Pipeline

ML Anomaly Engine

Live Dashboard

Mini ContainerOrchestration Simulator

Scheduler

Node Pool

Health Monitor

Lifecycle Engine

Hospital AssetAI Email Agent

LangGraph Workflow

MCP Servers

RAG Pipeline

Safety Layer

RAG Assistant

Distributed Rate Limiter

Predictive Analytics Pipeline

Academic background.

Master of Science in Data Science

B.Tech in Electronics & Communication Engineering

Let's connect.

Real-time pipelines.
Distributed systems.
ML at scale.

Real-Time Infrastructure Monitoring
& Analytics Platform

Mini Container
Orchestration Simulator

Hospital Asset
AI Email Agent