Q8K Quantization
Advanced 8-bit block quantization technique for large language models
Related Articles
Q8K128 for Phi-3 Mini: Better Reconstruction, No PPL Win →
Phi-3 Mini Mixed Q8K/Q4K Quantization →
TinyLlama Q8K Engine →
Glossary
Quantization:
Process of reducing model precision from FP32 to INT8
Block Quantization:
Quantizing weights in fixed-size blocks
← Back to Blog