Introducing Distiller

We’ve built Distiller with the following features and tools, keeping both DL researchers and engineers in mind:

  • A framework for integrating pruning, regularization, and quantization algorithms
  • A set of tools for analyzing and evaluating compression performance
  • Example implementations of state-of-the-art compression algorithms
Distiller

Pruning and regularization are two methods that can be used to induce sparsity in a DNN’s parameters tensors. Sparsity is a measure of how many elements in a tensor are exact zeros, relative to the tensor size. Sparse tensors can be stored more compactly in memory and can reduce the amount of compute and energy budgets required to carry out DNN operations. Quantization is a method to reduce the precision of the data type used in a DNN, leading again to reduced memory, energy and compute requirements. Distiller provides a growing set of state-of-the-art methods and algorithms for quantization, pruning (structured and fine-grained) and sparsity-inducing regularization–leading the way to faster, smaller, and more energy-efficient models.

To help you concentrate on your research, we’ve tried to provide the generic functionality, both high and low-level, that we think most people will need for compression research. Some examples:

  • Certain compression methods dynamically remove filters and channels from Convolutional layers while a DNN is trained. Distiller will perform the changes in the configuration of the targeted layers, and in their parameter tensors as well. In addition, it will analyze the data-dependencies in the model and modify dependent layers as needed
  • Distiller will automatically transform a model for quantization, by replacing specific layer types with their quantized counterparts. This saves you the hassle of manually converting each floating-point model you are using to its lower-precision form, and allows you to focus on developing the quantization method, and to scale and test your algorithm across many models

We’ve included Jupyter notebooks that demonstrate how to access statistics from the network model and compression process. For example, if you are planning to remove filters from your DNN, you might want to run a filter-pruning sensitivity analysis and view the results in a Jupyter notebook:

Distiller statistics are exported as Pandas DataFrames which are amenable to data-selection (indexing, slicing, etc.) and visualization.

Distiller comes with sample applications that employ some methods for compressing image-classification DNNs and language models. We’ve implemented a few compression research papers that can be used as a template for starting your own work. These are based on a couple of PyTorch’s example projects and show the simplicity of adding compression to pre-existing training applications.

Distiller

Only the Beginning

Distiller is a research library for the community at large and is part of Intel AI Lab’s effort to help scientists and engineers train and deploy DL solutions, publish research, and reproduce the latest innovative algorithms from the AI community. We are currently working on adding more algorithms, more features, and more application domains.

If you are actively researching or implementing DNN compression, we hope that you will find Distiller useful and fork it to implement your own research; we also encourage you to send us pull-requests of your work. You will be able to share your ideas, implementations, and bug fixes with other like-minded engineers and researchers—a benefit to all! We take research reproducibility and transparency seriously, and we think that Distiller can be the virtual hub where researchers from across the globe share their implementations.

For more information about Distiller, you can refer to the documentation and code.

Geek On.

[1] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally and Kurt Keutzer. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.” arXiv:1602.07360 [cs.CV]

[2] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto and Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. (https://arxiv.org/abs/1704.04861).

[3] Michael Zhu and Suyog Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression”, 2017 NIPS Workshop on Machine Learning of Phones and other Consumer Devices (https://arxiv.org/pdf/1710.01878.pdf)

[4] Sharan Narang, Gregory Diamos, Shubho Sengupta, and Erich Elsen. (2017). “Exploring Sparsity in Recurrent Neural Networks.” (https://arxiv.org/abs/1704.05119)

[5] Raanan Y. Yehezkel Rohekar, Guy Koren, Shami Nisimov and Gal Novik. “Unsupervised Deep Structure Learning by Recursive Independence Testing.”, 2017 NIPS Workshop on Bayesian Deep Learning (http://bayesiandeeplearning.org/2017/papers/18.pdf).

Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
© Intel Corporation
*Other names and brands may be claimed as the property of others.

Original article from:
https://www.intel.com/content/www/us/en/artificial-intelligence/posts/compressing-deep-learning-models-with-neural-network-distiller.html

Back to homepage