In-depth explorations of machine learning optimization, systems programming, and production-grade implementations.
Each article combines theory with practical code examples and real-world deployment strategies.
Filter by Topic
All ArticlesQuantizationRustLLMNode.jsDockerSystems Programming
Published
Q8K128 for Phi-3 Mini — Better Reconstruction, No PPL Win
📅 May 3, 2026⏱️ 12 min read🔬 Featured experiment
A focused Rust/Candle experiment halving the Q8K block size from 256 to 128 values.
Q8K128 improved qkv-projection reconstruction RMSE by 8.9%, but WikiText-2 perplexity
showed no improvement over the Q8K/Q4K baseline. Covers strict binary format validation,
histogram screening, and a reproducible benchmark pipeline.
TinyLlama Q8K Quantization Engine - CPU-Optimized LLM with Rust/Candle
📅 December 15, 2024⏱️ 15 min read🔥 Featured
Advanced Q8K quantization implementation for TinyLlama-1.1B-Chat model using Rust and Candle framework.
Features sophisticated permutation strategies (SVD-Importance, QR-Pivot), 3-tier validation pipeline,
and production Docker deployment with interactive Angular chat interface. Reduces model size by 4x
(from ~5GB to ~1.3GB) while maintaining <0.1% mean relative error.
Phi-3 Mini Mixed Q8K/Q4K Quantization — CPU-Optimized 3.8B Inference
📅 May 2026⏱️ 18 min read🦀 Rust
Layer-aware mixed-precision pipeline that compresses Phi-3 Mini 3.8B from 7.6 GB to 4.1 GB with
near-lossless quality. Q8K for all attention projections, Q4K for MLP down-proj layers, F32 for
norms and embeddings. Includes a 3-stage Rust/Candle pipeline, optional block-wise column
permutation, and full on-the-fly dequantization for CPU inference.
Node.js Backend Architecture - Production Patterns
📅 Q1 2026⏱️ ~20 min read
Deep dive into building scalable Node.js backends with Express, Redis, MongoDB, and WebSocket.
Covers authentication strategies, rate limiting, spam prevention, and multi-tier validation.
Node.jsExpressRedisMongoDB
Article in Progress
Coming Soon
Rust for Systems Programming - Memory Safety Without Garbage Collection
📅 Q2 2026⏱️ ~18 min read
Exploration of Rust's ownership model, borrowing rules, and zero-cost abstractions.
Practical examples of building high-performance systems without runtime overhead.
RustMemory ManagementPerformance
Article in Progress
Coming Soon
Multi-Instance Docker Orchestration with Node.js
📅 Q2 2026⏱️ ~12 min read
Building a production-grade Docker container pool manager with Node.js. Load balancing,
health checks, graceful degradation, and automated cleanup strategies.