Abstract: While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we ...
Abstract: As deep neural networks have been performing better and better on various tasks, their number of parameters has been increasing, and the demand for computing power and storage has been ...
Vector Post-Training Quantization (VPTQ) is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (<2-bit). VPTQ can ...