🇬🇧 English | 🇳🇱 Nederlands

Technical Articles

In-depth explorations of machine learning optimization, systems programming, and production-grade implementations. Each article combines theory with practical code examples and real-world deployment strategies.

Filter by Topic

All Articles Quantization Rust LLM Node.js Docker Systems Programming
Published

Q8K128 for Phi-3 Mini — Better Reconstruction, No PPL Win

A focused Rust/Candle experiment halving the Q8K block size from 256 to 128 values. Q8K128 improved qkv-projection reconstruction RMSE by 8.9%, but WikiText-2 perplexity showed no improvement over the Q8K/Q4K baseline. Covers strict binary format validation, histogram screening, and a reproducible benchmark pipeline.

Rust Candle Framework Q8K128 Phi-3 Mini Perplexity Benchmark SafeTensors
Read Full Article
Published

TinyLlama Q8K Quantization Engine - CPU-Optimized LLM with Rust/Candle

Advanced Q8K quantization implementation for TinyLlama-1.1B-Chat model using Rust and Candle framework. Features sophisticated permutation strategies (SVD-Importance, QR-Pivot), 3-tier validation pipeline, and production Docker deployment with interactive Angular chat interface. Reduces model size by 4x (from ~5GB to ~1.3GB) while maintaining <0.1% mean relative error.

Rust Candle Framework Q8K Quantization Docker Angular 19 LLM Optimization
Read Full Article
Published

Phi-3 Mini Mixed Q8K/Q4K Quantization — CPU-Optimized 3.8B Inference

Layer-aware mixed-precision pipeline that compresses Phi-3 Mini 3.8B from 7.6 GB to 4.1 GB with near-lossless quality. Q8K for all attention projections, Q4K for MLP down-proj layers, F32 for norms and embeddings. Includes a 3-stage Rust/Candle pipeline, optional block-wise column permutation, and full on-the-fly dequantization for CPU inference.

Rust Candle Framework Q8K / Q4K Phi-3 Mini Mixed Precision LLM Optimization
Read Full Article
Coming Soon

Node.js Backend Architecture - Production Patterns

Deep dive into building scalable Node.js backends with Express, Redis, MongoDB, and WebSocket. Covers authentication strategies, rate limiting, spam prevention, and multi-tier validation.

Node.js Express Redis MongoDB
Article in Progress
Coming Soon

Rust for Systems Programming - Memory Safety Without Garbage Collection

Exploration of Rust's ownership model, borrowing rules, and zero-cost abstractions. Practical examples of building high-performance systems without runtime overhead.

Rust Memory Management Performance
Article in Progress
Coming Soon

Multi-Instance Docker Orchestration with Node.js

Building a production-grade Docker container pool manager with Node.js. Load balancing, health checks, graceful degradation, and automated cleanup strategies.

Docker Node.js Load Balancing
Article in Progress
← Back to Profile