Open to new opportunities

Real-time pipelines.
Distributed systems.
ML at scale.

Sai Thanmai — AI Engineer · Software Engineer · ML Engineer

I process millions of records, build streaming infrastructure, and design AI-driven analytics systems that turn raw events into production intelligence. Currently doing that at Pycube Inc.

10
Projects Built
22M+
Records Processed
$10M+
Revenue Impact
15+
Technologies
About
I'm an engineer who's happiest when data is moving fast and systems need to be reliable. My work sits at the intersection of AI Engineering, distributed systems, and machine learning — building the infrastructure that turns raw events into real-time decisions.

What drives me is the end-to-end problem: not just training a model, but getting it into production, keeping it honest, and making it useful at scale. I care about clean pipelines, observable systems, and code that other engineers actually want to maintain.

What I'm focused on

Streaming architectures, real-time anomaly detection, container orchestration, and building ML systems that hold up under production load.

What I value

Systems thinking over siloed models. Measurable impact over impressive demos. Clean abstractions and honest engineering tradeoffs.

What I'm looking for

Roles where I can own end-to-end systems — from data ingestion through ML inference to the dashboard someone makes a decision from.

Professional Experience

Where I've worked.

From enterprise healthcare AI to telecom-scale ML — building systems that drive measurable business outcomes.

Senior Data Scientist
Pycube, Inc · Virginia, USA
Jan 2023 — Present
  • Engineered enterprise RAG pipeline with LangChain, FAISS, Pinecone, and AWS SageMaker for document Q&A across 22M+ medical documents, achieving 92% retrieval accuracy.
  • Architected Generative AI evaluation framework assessing hallucination, relevance, and coherence across GPT, Mistral, and Claude — reducing model failure rates by 30% across 8+ AI projects.
  • Deployed AutoML annotation infrastructure on AWS Lambda with active learning loops — reducing inter-rater error by 25% and accelerating model release cycles from 14 to 4 days.
Data Scientist
ECIL · Hyderabad, India
June 2020 — Jan 2022
  • Spearheaded ensemble ML models (XGBoost, Random Forest, LSTM) on 4M+ telecom subscriber sequences achieving 0.91 AUC, improving churn prediction precision by 18% and enabling $10M+ retention revenue.
  • Deployed Deep Learning architectures (CNNs, LSTMs) for real-time customer intent classification from 500K+ monthly interactions, reducing misrouting by 22%.
  • Engineered NLP feedback pipeline with BERT and Hugging Face to extract insights from 500K+ unstructured responses, reducing manual analysis time by 60%.
Tech Stack

What I work with.

Deep expertise across AI Engineering, ML/AI, and cloud infrastructure.

Languages

PythonSQLGo BashJavaC/C++

Data & Streaming

Apache KafkaTimescaleDBPostgreSQL RedisMongoDBSnowflake

ML / AI

PyTorchTensorFlowscikit-learn LLMsRAGLangChain FAISSHugging FaceBERT

Infrastructure

DockerKubernetesAWS SageMaker LambdaEC2FastAPI WebSocketGit
Featured Work

Engineering at scale.

End-to-end systems built from scratch — production-grade architecture, real-time pipelines, and ML-powered intelligence.

Project 01

Real-Time Infrastructure Monitoring
& Analytics Platform

Live

A full observability platform that simulates infrastructure events, streams them through Kafka, detects anomalies with ML, and displays everything on a live dashboard — updated every 2 seconds via WebSocket.

  • Streaming pipeline: 12 services × 1 event/sec through Apache Kafka
  • TimescaleDB hypertables with continuous 1-min aggregates
  • 4-strategy anomaly detection: threshold, z-score, rate-of-change, Isolation Forest (ML)
  • FastAPI backend with WebSocket real-time push to live dashboard
  • Dockerized — entire platform boots with one command
System Flow
EVENT GEN 12 services 1 evt/sec each CPU · MEM · DISK NETWORK KAFKA Stream ingestion Topic partitioning High throughput Event buffering ANALYTICS TimescaleDB 1-min aggregates Isolation Forest 4 detection modes FASTAPI REST + WebSocket Real-time push Alert routing API endpoints LIVE DASH Chart.js 2s refresh
Architecture

Event Generators

12 simulated services producing CPU, memory, disk, and network metrics at 1 event/sec each.

Kafka Pipeline

Apache Kafka ingestion layer handling high-throughput event streaming with topic partitioning.

ML Anomaly Engine

4-strategy detection (threshold, z-score, rate-of-change, Isolation Forest) with configurable alerting.

Live Dashboard

WebSocket-powered real-time UI with Chart.js visualizations updating every 2 seconds.

Tech Stack
Python Apache Kafka TimescaleDB FastAPI WebSocket scikit-learn Docker Compose Chart.js
View on GitHub
Project 02

Mini Container
Orchestration Simulator

Built

A ground-up simulator of Kubernetes-style container scheduling and orchestration. Implements core concepts: pod scheduling, node affinity, resource allocation, and health-based eviction — without requiring a full K8s cluster.

  • Custom scheduler with bin-packing and resource-aware placement
  • Simulated node pool with CPU/memory capacity tracking
  • Pod lifecycle management: pending → running → terminated
  • Health checks, restart policies, and eviction logic
Core Components

Scheduler

Bin-packing and resource-aware placement algorithms assigning pods to nodes by capacity.

Node Pool

Simulated cluster with per-node CPU/memory tracking, availability states, and drain support.

Health Monitor

Periodic health checks with automatic restart and rescheduling of failed containers.

Lifecycle Engine

Full pod lifecycle from pending through running to terminated with eviction policies.

Tech Stack
Python Docker Kubernetes Concepts Scheduling Algorithms Orchestration
View on GitHub
Project 03

Hospital Asset
AI Email Agent

Live

An intelligent agentic system that reads hospital staff emails, queries asset databases via MCP servers, searches SOPs and user manuals through a RAG pipeline, and replies with accurate, human-like answers — fully automated, end-to-end.

  • LangGraph stateful async workflow with conditional branching and parallel tool execution
  • 3 dedicated MCP servers: MySQL (natural language to SQL), ChromaDB (semantic search), Outlook (Graph API)
  • RAG pipeline: pdfplumber + GPT-4o Vision OCR for scanned manuals, chunked into ChromaDB
  • AI-generated SOPs — Claude writes 9-section device SOPs from DB inventory, auto-indexed into vector store
  • Validation gate before every reply: quality check, no raw SQL/JSON leakage, human fallback on failure
System Flow
INBOX Outlook poll 60s interval Whitelist filter Graph API LANGGRAPH Intent classify Route + branch Claude Haiku Parallel fetch MCP SERVERS MySQL (AWS RDS) ChromaDB RAG NL to SQL Vector search VALIDATE Quality gate Merge sources Safety check Human fallback REPLY HTML email CSV attach Word report Outlook MCP
Architecture

LangGraph Workflow

Stateful async graph with conditional routing, parallel MCP tool calls, and a validation gate before every reply.

MCP Servers

3 independent servers (MySQL, ChromaDB, Outlook) — agent declares what it needs, never touches infra directly.

RAG Pipeline

PDFs ingested via pdfplumber + GPT-4o OCR, chunked and stored in ChromaDB. AI-generated SOPs auto-indexed per device type.

Safety Layer

SQL read-only enforcement, whitelist-only email processing, quality validation, and human-safe fallback on every failure path.

Tech Stack
Python LangGraph Claude (Haiku) MCP ChromaDB MySQL / AWS RDS Microsoft Graph API OpenAI Embeddings Flask
View on GitHub
More Projects

RAG Assistant

Retrieval-Augmented Generation system with vector indexing, semantic search, and LLM-powered Q&A over custom document corpora.

PythonLLMsFAISSEmbeddingsRAG

Distributed Rate Limiter

High-throughput distributed rate limiter using token bucket and sliding window algorithms for multi-node API gateway deployments.

PythonRedisToken BucketDistributed Systems

Predictive Analytics Pipeline

End-to-end pipeline ingesting live infrastructure metrics, applying forecasting models, and surfacing insights for proactive capacity planning.

PythonKafkaML ForecastingFastAPI
Education

Academic background.

Master of Science in Data Science

University of Maryland, Baltimore County
Aug 2022 — Dec 2023

B.Tech in Electronics & Communication Engineering

SCSVMV University, Kanchipuram, India
July 2018 — June 2022
Get In Touch

Let's connect.

Open to AI Engineering, Software Engineering, and ML Engineering roles. Feel free to reach out — always happy to chat about systems, data, or interesting problems.

Virginia, USA