Technical Articles

Published

Q8K128 for Phi-3 Mini — Better Reconstruction, No PPL Win

📅 May 3, 2026 ⏱️ 12 min read 🔬 Featured experiment

A focused Rust/Candle experiment halving the Q8K block size from 256 to 128 values. Q8K128 improved qkv-projection reconstruction RMSE by 8.9%, but WikiText-2 perplexity showed no improvement over the Q8K/Q4K baseline. Covers strict binary format validation, histogram screening, and a reproducible benchmark pipeline.

Rust Candle Framework Q8K128 Phi-3 Mini Perplexity Benchmark SafeTensors

Read Full Article

Published

TinyLlama Q8K Quantization Engine - CPU-Optimized LLM with Rust/Candle

📅 December 15, 2024 ⏱️ 15 min read 🔥 Featured

Advanced Q8K quantization implementation for TinyLlama-1.1B-Chat model using Rust and Candle framework. Features sophisticated permutation strategies (SVD-Importance, QR-Pivot), 3-tier validation pipeline, and production Docker deployment with interactive Angular chat interface. Reduces model size by 4x (from ~5GB to ~1.3GB) while maintaining <0.1% mean relative error.

Rust Candle Framework Q8K Quantization Docker Angular 19 LLM Optimization

Read Full Article

Published

Phi-3 Mini Mixed Q8K/Q4K Quantization — CPU-Optimized 3.8B Inference

📅 May 2026 ⏱️ 18 min read 🦀 Rust

Layer-aware mixed-precision pipeline that compresses Phi-3 Mini 3.8B from 7.6 GB to 4.1 GB with near-lossless quality. Q8K for all attention projections, Q4K for MLP down-proj layers, F32 for norms and embeddings. Includes a 3-stage Rust/Candle pipeline, optional block-wise column permutation, and full on-the-fly dequantization for CPU inference.

Rust Candle Framework Q8K / Q4K Phi-3 Mini Mixed Precision LLM Optimization

Read Full Article

Coming Soon

Node.js Backend Architecture - Production Patterns

📅 Q1 2026 ⏱️ ~20 min read

Deep dive into building scalable Node.js backends with Express, Redis, MongoDB, and WebSocket. Covers authentication strategies, rate limiting, spam prevention, and multi-tier validation.

Node.js Express Redis MongoDB

Article in Progress

Coming Soon

Rust for Systems Programming - Memory Safety Without Garbage Collection

📅 Q2 2026 ⏱️ ~18 min read

Exploration of Rust's ownership model, borrowing rules, and zero-cost abstractions. Practical examples of building high-performance systems without runtime overhead.

Rust Memory Management Performance

Article in Progress

Coming Soon

Multi-Instance Docker Orchestration with Node.js

📅 Q2 2026 ⏱️ ~12 min read

Building a production-grade Docker container pool manager with Node.js. Load balancing, health checks, graceful degradation, and automated cleanup strategies.

Docker Node.js Load Balancing

Article in Progress

Filter by Topic

Q8K128 for Phi-3 Mini — Better Reconstruction, No PPL Win

TinyLlama Q8K Quantization Engine - CPU-Optimized LLM with Rust/Candle

Phi-3 Mini Mixed Q8K/Q4K Quantization — CPU-Optimized 3.8B Inference

Node.js Backend Architecture - Production Patterns

Rust for Systems Programming - Memory Safety Without Garbage Collection

Multi-Instance Docker Orchestration with Node.js