We’ve built Distiller with the following features and tools, keeping both DL researchers and engineers in mind:
Pruning and regularization are two methods that can be used to induce sparsity in a DNN’s parameters tensors. Sparsity is a measure of how many elements in a tensor are exact zeros, relative to the tensor size. Sparse tensors can be stored more compactly in memory and can reduce the amount of compute and energy budgets required to carry out DNN operations. Quantization is a method to reduce the precision of the data type used in a DNN, leading again to reduced memory, energy and compute requirements. Distiller provides a growing set of state-of-the-art methods and algorithms for quantization, pruning (structured and fine-grained) and sparsity-inducing regularization–leading the way to faster, smaller, and more energy-efficient models.
To help you concentrate on your research, we’ve tried to provide the generic functionality, both high and low-level, that we think most people will need for compression research. Some examples:
We’ve included Jupyter notebooks that demonstrate how to access statistics from the network model and compression process. For example, if you are planning to remove filters from your DNN, you might want to run a filter-pruning sensitivity analysis and view the results in a Jupyter notebook:
Distiller statistics are exported as Pandas DataFrames which are amenable to data-selection (indexing, slicing, etc.) and visualization.
Distiller comes with sample applications that employ some methods for compressing image-classification DNNs and language models. We’ve implemented a few compression research papers that can be used as a template for starting your own work. These are based on a couple of PyTorch’s example projects and show the simplicity of adding compression to pre-existing training applications.
Distiller is a research library for the community at large and is part of Intel AI Lab’s effort to help scientists and engineers train and deploy DL solutions, publish research, and reproduce the latest innovative algorithms from the AI community. We are currently working on adding more algorithms, more features, and more application domains.
If you are actively researching or implementing DNN compression, we hope that you will find Distiller useful and fork it to implement your own research; we also encourage you to send us pull-requests of your work. You will be able to share your ideas, implementations, and bug fixes with other like-minded engineers and researchers—a benefit to all! We take research reproducibility and transparency seriously, and we think that Distiller can be the virtual hub where researchers from across the globe share their implementations.
 Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally and Kurt Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.” arXiv:1602.07360 [cs.CV]
 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto and Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (https://arxiv.org/abs/1704.04861).
 Michael Zhu and Suyog Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression”, 2017 NIPS Workshop on Machine Learning of Phones and other Consumer Devices (https://arxiv.org/pdf/1710.01878.pdf)
 Sharan Narang, Gregory Diamos, Shubho Sengupta, and Erich Elsen. (2017). “Exploring Sparsity in Recurrent Neural Networks.” (https://arxiv.org/abs/1704.05119)
 Raanan Y. Yehezkel Rohekar, Guy Koren, Shami Nisimov and Gal Novik. “Unsupervised Deep Structure Learning by Recursive Independence Testing.”, 2017 NIPS Workshop on Bayesian Deep Learning (http://bayesiandeeplearning.org/2017/papers/18.pdf).
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
© Intel Corporation
*Other names and brands may be claimed as the property of others.