Industrial AI blog

Reducing memory of neural networks for IoT/edge devices

9 June 2020

Manikandan Ravikiran
R&D Centre, Hitachi India Pvt. Ltd.

Introduction

Video analytics solutions are typically a mix of multifaceted applications such as drones, robots, self-driving cars, safety surveillance systems etc. all of which require on-edge computation. Especially, with the rise of IoT solutions, there is need of lower memory AI systems on edge devices.

To accommodate this requirement, we developed an end-to-end automatic neural network memory reduction technology called “Multilayer Pruning Framework” that focuses on reducing memory and computation. We further validated the technology developed on multiple video analytics applications involving object detection and image classification to achieve up to 96% and 90% reduction in memory and computation, respectively.

Outline of our proposed Multilayer Pruning Framework

Deep neural network exhibits two characteristics namely: (i) Presence of a large number of unused parameters which increases memory consumption (ii) Variety in architectures depending upon the end application.

Our proposed Multilayer Pruning Framework (Figure 1) is an end-to-end non-iterative unused parameter removal approach that exploits these characteristics where we hypothesize that identification and removal of the unused parameters would reduce the memory and computation and vary with neural network architecture.
Inline, with these hypotheses, our framework introduces following steps

1.: Sparsity Induction – Here the framework presents methods for increasing the number of unused parameters with small drop in performance.
2.: Unused Parameter Selection – Here our framework proposes approaches to identify which unused parameters to remove.
3.: Removal guidelines – Here the process of using the proposed framework to remove unused for compressing different neural network architectures is proposed.

Figure 1: Illustration of proposed Multilayer Pruning Framework

1. Sparsity induction

Originally parameters in convolution layers are in form of 3D matrices (referred to convolution matrix from here onwards) where each element in matrix corresponds to a parameter value. A parameter value of zero indicates that it is unused and is removable. However, in many deep neural networks the unused parameters are scattered around the convolution matrix without any structure and some of the parameter have values very close to zero.

Figure 2: Sparsity Induction in convolutional matrices

To make sure unused parameters are organized in structured manner and to increase number of unused parameters by transferring capacity of parameters with values close to other parameters we introduce sparsity induction. Sparsity Induction (Figure 2) uses combination of methods which increases the unused parameters and enforces structure (grouping of zeroes) in convolutional matrix.

2. Unused parameter selection

Previously, we created a convolutional matrix with unused parameters. However, removing a single 0 from the matrix is not possible, instead we group a set of 0’s together across rows or columns. This results in creation of many such groups within the convolution matrix where all are zeros (Figure 3(a)) or majority of values are zero (Figure 3(b)).

Figure 3: Sparsity in convolutional matrices having
(a) groups with all zero and (b) groups with majority zero

Using these rows or columns with zeroes, we designed a criterion called sparsity level which computes fraction of zeros present across the entire convolution matrix with respect to zero rows or columns. If this sparsity level is higher than a user-defined threshold, the entire convolution matrix will be removed. Thus, the user can set a threshold based on his/her memory requirements. If the threshold is low, more convolution matrix will be removed leading to more compression and vice versa.

3. Removal guidelines for various neural networks

In the previous steps, we introduced sparsity induction and unused parameter selection criterion. However, neural networks depending on applications have different types of layers and connections. As such we tested multiple approaches across classification and detection network, and this resulted in the following removal guidelines

i.: Object detection network: Remove unused parameters from all layers till detection layer, then the layer following detection layer and finally the detection layer. Repeat this complete network.
ii.: Image classification network: Execute sparsity induction, unused parameter selection and removal step for each layer sequentially.

Results and Summary

Table 1 shows results of our approach with object detection network SSD300 [1]. Meanwhile, Table 2 shows results on image classification network [2]. We confirmed that our proposed method can produces high memory reduction and hence low processing cost across both.

Table 1. Benchmarking multilayer pruning framework for SSD300 on PASCAL VOC.

Table 1. Benchmarking multilayer pruning framework for SSD300 on PASCAL VOC.
Model	mAP	Number of Params	Model size (MB)
SqueezeNet SSD	38.45	7 million	28
Resnet10 SSD	64.83	6.7 million	26.8
Mobilenet SSD	70.04	8.8 million	35.2
(Ours)	75.07	3.9 million	15.6

Table 2. Benchmarking multilayer pruning framework for VGG16 on CIFAR-10

Table 2. Benchmarking multilayer pruning framework for VGG16 on CIFAR-10
Model	Accuracy	Number of Params	Model size (MB)
Efficient Pruning	93.4	190 million	~100
(Ours)	94.01	1 million	4.2

Finally, we believe our proposed approach has great potential in reducing the memory in variety of neural network architectures, thereby helping develop edge solutions with of lower powered AI systems. For more details, we encourage you to read our paper [3].

Acknowledgements

Thanks to my co-authors Pravendra Singh, Neeraj Matiyali and Prof. Vinay P. Namboodiri from IIT-Kanpur, with whom this research work was jointly executed.

*If you would like to find out more about activities at Hitachi India Research & Development, please visit our website.

References

[1]: Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., & Berg, A.C. (2016). SSD: Single Shot Multibox Detector. ECCV.
[2]: Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
[3]: Singh, P., Ravikiran, M., Matiyali, N., & Namboodiri, V. (2018). Multi-layer Pruning Framework for Compressing Single Shot Multibox Detector. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1318-1327. https://ieeexplore.ieee.org/document/8658776/