About Me

I’m a Lead Research Scientist with ~10 years of applied ML and NLP experience, currently building production GenAI systems at scale on the Foundation Models team at 84.51° (Kroger’s data science arm). I own the full GenAI stack, spanning LLM fine-tuning and alignment, semantic search and embedding infrastructure, multimodal AI, and synthetic data generation. I take systems from research artifact to shipped product end-to-end. Recent work includes domain-specific LLM fine-tuning, multi-agent orchestration with tool use, and production semantic search.

10+ Years production ML

+5% Conversion lift over keyword baseline

<200ms P99 latency at scale

E2E Research to production ownership

What I Do

Synthetic Data Generation

Generating domain-specific training data from scratch: structured hierarchical examples, safety-aligned samples, and task-specific corpora designed to minimize class imbalance and label bias. Used to fine-tune small language models on targeted tasks without relying on frontier APIs.

Fine-tuning & Alignment

PEFT/LoRA fine-tuning on domain-specific corpora, quantization (GPTQ, AWQ, GGUF), safety evaluation, red-teaming, and multi-intent SLM development. Hands-on from dataset curation through deployment.

Semantic Search & Embeddings

Bi-encoder and cross-encoder architectures, ANN indexing, hybrid sparse-dense retrieval pipelines, and embedding evaluation frameworks. Experience taking dense retrieval from prototype to tens of millions of queries per month.

Production Serving

vLLM, Triton Inference Server, OpenAI-compatible API deployment, latency optimization, model compression, and controlled A/B rollouts. Optimized GPU inference to hit P99 under 200ms at production scale.

Foundation Models

Pre-training transformer-based models from scratch: custom tokenization, training objective design, distributed training with DeepSpeed/PyTorch FSDP, and data curriculum scheduling. Applied to behavioral sequence modeling and representation learning.

Vision LLMs

Deploying open-source vision-language models on a vLLM serving stack, and fine-tuning vision models as custom domain classifiers. Covers the full arc from multimodal prototype to production-grade inference with latency and throughput constraints.

Tech Stack

LLMs & GenAI

Fine-tuning PEFT / LoRA Quantization RAG Semantic Search Embedding Models vLLM Triton Inference Server Vector Databases Transformers

ML Systems

PyTorch TensorFlow DeepSpeed Distributed Training Model Serving at Scale A/B Testing Pipeline Automation

Infrastructure

Python SQL Docker AWS Azure REST / gRPC Elasticsearch Redis Git CI/CD

Snehal Patel