Blog – Snehal.AI ✨

14 min read

Jun 14, 2026

Running a Headless Claude Code Trading Agent on GCP with the Robinhood MCP Server

10 min read

Jun 6, 2026

Kokoro-82M: Running a Local Text-to-Speech Model on One GPU

13 min read

Jun 3, 2026

Running Ideogram 4 Locally: Quantized Inference and Structured JSON Captions

6 min read

Jun 1, 2026

Simple Things Scaled Up

10 min read

May 31, 2026

X-Token: Distilling Knowledge Across Tokenizers That Don't Speak the Same Language

10 min read

May 4, 2026

Building an Agentic Movie Recommender on Cloudflare Pages

4 min read

May 1, 2026

It Started as a Chatbot

22 min read

Apr 30, 2026

LLM Glossary (In Progress)

11 min read

Apr 25, 2026

DFlash: How Block Diffusion Breaks the Speculative Decoding Ceiling

1 min read

Apr 21, 2026

Laws of LLMs and Agents

5 min read

Apr 19, 2026

The Fragmented Researcher: Engineering Focus Around a 10-Month-Old

5 min read

Apr 18, 2026

Large Language Models are beautiful.

16 min read

Apr 14, 2026

27,000 Tokens Before Hello: The Agent Harness Tax

20 min read

Apr 6, 2026

Gemma 4: Everything You Need to Know About Google's Most Capable Open Model

4 min read

Mar 26, 2026

Embeddings are beautiful.

15 min read

Mar 25, 2026

TurboQuant: The cheat sheet that ate your GPU (and how Google fixed it)

5 min read

Mar 1, 2026

Doc-to-LoRA & Text-to-LoRA: How Sakana is teaching LLMs to learn instantly

8 min read

Feb 26, 2026

BlinkThink: Self-Hosted Camera Snapshots with FastAPI and Gemini

7 min read

Feb 15, 2026

Building a Browser Agent with Gemini and Playwright

5 min read

Feb 8, 2026

The 10-Million Token Paradox: Decoding the Logic of Recursive Language Models

3 min read

Feb 8, 2026

OpenClaw: 98% Plumbing, 2% Revolution

9 min read

Feb 15, 2025

VerbalVista: Talking to Your Own Data with RAG, FAISS, and a Bit of Stubbornness

14 min read

May 1, 2024

Snehal Patel

All Posts

Running a Headless Claude Code Trading Agent on GCP with the Robinhood MCP Server

Kokoro-82M: Running a Local Text-to-Speech Model on One GPU

Running Ideogram 4 Locally: Quantized Inference and Structured JSON Captions

Simple Things Scaled Up

X-Token: Distilling Knowledge Across Tokenizers That Don't Speak the Same Language

Building an Agentic Movie Recommender on Cloudflare Pages

It Started as a Chatbot

LLM Glossary (In Progress)

DFlash: How Block Diffusion Breaks the Speculative Decoding Ceiling

Laws of LLMs and Agents

The Fragmented Researcher: Engineering Focus Around a 10-Month-Old

Large Language Models are beautiful.

27,000 Tokens Before Hello: The Agent Harness Tax

Gemma 4: Everything You Need to Know About Google's Most Capable Open Model

Embeddings are beautiful.

TurboQuant: The cheat sheet that ate your GPU (and how Google fixed it)

Doc-to-LoRA & Text-to-LoRA: How Sakana is teaching LLMs to learn instantly

BlinkThink: Self-Hosted Camera Snapshots with FastAPI and Gemini

Building a Browser Agent with Gemini and Playwright

The 10-Million Token Paradox: Decoding the Logic of Recursive Language Models

OpenClaw: 98% Plumbing, 2% Revolution

VerbalVista: Talking to Your Own Data with RAG, FAISS, and a Bit of Stubbornness

Exploring Model Quantization for LLMs