All Posts

14 min read
Jun 14, 2026

Running a Headless Claude Code Trading Agent on GCP with the Robinhood MCP Server

10 min read
Jun 6, 2026

Kokoro-82M: Running a Local Text-to-Speech Model on One GPU

13 min read
Jun 3, 2026

Running Ideogram 4 Locally: Quantized Inference and Structured JSON Captions

6 min read
Jun 1, 2026

Simple Things Scaled Up

10 min read
May 31, 2026

X-Token: Distilling Knowledge Across Tokenizers That Don't Speak the Same Language

10 min read
May 4, 2026

Building an Agentic Movie Recommender on Cloudflare Pages

4 min read
May 1, 2026

It Started as a Chatbot

22 min read
Apr 30, 2026

LLM Glossary (In Progress)

11 min read
Apr 25, 2026

DFlash: How Block Diffusion Breaks the Speculative Decoding Ceiling

1 min read
Apr 21, 2026

Laws of LLMs and Agents

5 min read
Apr 19, 2026

The Fragmented Researcher: Engineering Focus Around a 10-Month-Old

5 min read
Apr 18, 2026

Large Language Models are beautiful.

16 min read
Apr 14, 2026

27,000 Tokens Before Hello: The Agent Harness Tax

20 min read
Apr 6, 2026

Gemma 4: Everything You Need to Know About Google's Most Capable Open Model

4 min read
Mar 26, 2026

Embeddings are beautiful.

15 min read
Mar 25, 2026

TurboQuant: The cheat sheet that ate your GPU (and how Google fixed it)

5 min read
Mar 1, 2026

Doc-to-LoRA & Text-to-LoRA: How Sakana is teaching LLMs to learn instantly

8 min read
Feb 26, 2026

BlinkThink: Self-Hosted Camera Snapshots with FastAPI and Gemini

7 min read
Feb 15, 2026

Building a Browser Agent with Gemini and Playwright

5 min read
Feb 8, 2026

The 10-Million Token Paradox: Decoding the Logic of Recursive Language Models

3 min read
Feb 8, 2026

OpenClaw: 98% Plumbing, 2% Revolution

9 min read
Feb 15, 2025

VerbalVista: Talking to Your Own Data with RAG, FAISS, and a Bit of Stubbornness

14 min read
May 1, 2024

Exploring Model Quantization for LLMs