I’m a Lead Research Scientist with ~10 years of applied ML and NLP experience, currently building production GenAI systems at scale on the Foundation Models team at 84.51° (Kroger’s data science arm). I own the full GenAI stack, spanning LLM fine-tuning and alignment, semantic search and embedding infrastructure, multimodal AI, and synthetic data generation. I take systems from research artifact to shipped product end-to-end. Recent work includes domain-specific LLM fine-tuning, multi-agent orchestration with tool use, and production semantic search.
What I Do
Synthetic Data Generation
Generating domain-specific training data from scratch: structured hierarchical examples, safety-aligned samples, and task-specific corpora designed to minimize class imbalance and label bias. Used to fine-tune small language models on targeted tasks without relying on frontier APIs.
Fine-tuning & Alignment
PEFT/LoRA fine-tuning on domain-specific corpora, quantization (GPTQ, AWQ, GGUF), safety evaluation, red-teaming, and multi-intent SLM development. Hands-on from dataset curation through deployment.
Semantic Search & Embeddings
Bi-encoder and cross-encoder architectures, ANN indexing, hybrid sparse-dense retrieval pipelines, and embedding evaluation frameworks. Experience taking dense retrieval from prototype to tens of millions of queries per month.
Production Serving
vLLM, Triton Inference Server, OpenAI-compatible API deployment, latency optimization, model compression, and controlled A/B rollouts. Optimized GPU inference to hit P99 under 200ms at production scale.
Foundation Models
Pre-training transformer-based models from scratch: custom tokenization, training objective design, distributed training with DeepSpeed/PyTorch FSDP, and data curriculum scheduling. Applied to behavioral sequence modeling and representation learning.
Vision LLMs
Deploying open-source vision-language models on a vLLM serving stack, and fine-tuning vision models as custom domain classifiers. Covers the full arc from multimodal prototype to production-grade inference with latency and throughput constraints.
