prof. dr. akash kumar · 2019-11-11 · convolutional neural networks (using alexnet) ¤ extreme...
Post on 31-Jul-2020
10 Views
Preview:
TRANSCRIPT
Implementation and Optimization of Deep Neural Networks
Prof. Dr. Akash KumarChair for Processor Design
(Some slides adapted from Intelligent Architectures 5LIL0) http://www.es.ele.tue.nl/~heco/courses/IA-5LIL0/index.html
© Akash Kumar
Topics
¨ Introduction ¨ What are DNNs and how do they operate
¤ Convolutional Neural Networks¤ Learning Frameworks¤ Applications using DNNs
¨ Optimizations¤ Making the NW more compact¤ Quantization of activations and weights¤ Exploiting data & weight reuse by advanced loop
transformations and local buffering¨ DNN architectures and accelerators¨ The future: Beyond DNNs
2
© Akash Kumar
What's Deep Learning/ Deep Neural Network?
¨ Self learning algorithms¨ Using huge data sets to learn¨ Deep: many "learning layers"¨ Brain inspired, based on neurons and synapses
(connections)¨ High classification accuracy¨ Many applications; let's look at ImageNet
classification and Tesla Autopilot
3
© Akash Kumar
ImageNet Winners (top-5 classification error)
¨ ImageNet dataset: 10M images, 10000 classes
4
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014 2015 2016 2017
top-
5 er
ror
Traditional methods Deep Learning Human
© Akash Kumar
AI: Tesla Autopilot
¨ Tesla Model S demonstration of autonomous driving¨ Computing system monitors radar and several
cameras¤ Detect objects like cars, and pedestrians¤ Monitor traffic signs¤ Lane tracking and possible lane changing¤ Auto parking
5
Tesla web page: www.tesla.com/videos/ November 2016
© Akash Kumar
Deep Learning and High-performance HW Architectures
6
© Akash Kumar
Our Brain
¨ The basic computational unit of the brain is a neuron¤ about 80 Billion neurons in our
brain¤ Neurons are connected with
nearly 1014 – 1015 synapses¤ Neurons receive input signals from
dendrites and produce output signal along axon, which interact with the dendrites of other neurons via synaptic weights
¨ Synaptic weights – learnable & control influence strength
7
© Akash Kumar
Artificial Neuron
¨ An overview: more to follow
8
© Akash Kumar
ANN: Neurons, structured in Layers
¨ Weights represent synaptic strength
9
© Akash Kumar
Deep Neural Networks
¨ An ANN with multiple hidden layers¨ Two main types of DNNs: without memory and with memory¨ Without memory
¤ Fully-Connected NNn feed forward, a.k.a. multilayer perceptron (MLP)
¤ Convolutional NN (CNN)n feed forward, sparsely-connected with weight sharingn note CNNs typically also contain 1 or more fully connected layers
¨ With memory¤ Recurrent NN (RNN)
n feedback¤ Long Short-Term Memory (LSTM)
n feedback + storage
10
© Akash Kumar
Deep Neural Networks11
© Akash Kumar
Artificial Neuron Model12
Activationfunction:NonLinear
Transformation
DotProduct
© Akash Kumar
Neuron Applied to Image Region13
● Neuron detects features in region● Convolution: Same neuron applied to all regions in the
image yields an output feature map
● Input can be taken from multiple input feature maps● Multiple neurons generate multiple output feature maps
© Akash Kumar
2D Convolution, sliding window14
• input 5x5• 3x3 kernel• output 3x3
© Akash Kumar
Convolution in CNNs/DNNs15
¨ N = batch size
¨ C input feature maps of size HxW
¨ M output feature maps of size ExF
¨ M filters of size RxS
© Akash Kumar
Fully Connected (FC) layer
¨ FC can be viewed as a special case of convolution, with:¤ H=R¤ W=S¤ E=F=1
17
Size of input fmaps = size of convolution kernel
Output fmaps have size 1x1, i.e. each output fmap represents 1 output neuron
© Akash Kumar
Activation functions18
© Akash Kumar
Pooling
¨ Reduce resolution ¨ Increase receptive (input) area of outputs¨ Overlapping or
Non-overlapping, depending on stride U
¨ Using the max or average
19
© Akash Kumar
Normalization
¨ Batch normalization¤ normalize activations of a batch such that average =>
0, and sigma => 1¤ based on statistics of training set¤ gives higher accuracy, and faster training
20
© Akash Kumar
Popular networks22
© Akash Kumar
Example: LeNet-5 structure23
[Lecun e.a. Proc. of the IEEE, 1998]
¨ 2 Conv layers
¨ 2 FC layers¨ 60k weights, 341k MACs (mult-acc) per input picture
© Akash Kumar
Inference vs. Training
¨ Training: determine weights¨ 3 types of learning
¤ Supervised: using inputs with labeled outputs¤ Unsupervised¤ Reinforcement
¨ Feedforward + Backward calculations needed
¨ Inference: apply a learned DNN ¤ feedforward: input -> classification
24
© Akash Kumar
Available Deep Learning Software Frameworks
25
© Akash Kumar
Deep learning stack27
TensorFlow Pytorch(Ignite) Keras
CNTK TensorFlow
Caffe2Caffe
Caffe2(Brew)
TensorRT
Tensor Comprehensions
Intel MKL Eigen cuBLAS QNNPACK cuDNN
CPU GPU
High LevelAPI
Libraries
Hardware
InferenceEngine
LowLevel API
• High Level API : Provides abstraction for application use
• Low level API : Integrate system level support libraries and provide DL functionality
• System level support libraries provide efficient kernel implementations of
– Basic linear algebra subprograms (BLAS)
– DNN primitives– GPU kernels
© Akash Kumar
Deep learning frameworks
¨ How frameworks are different?¤ Capabilities : Training, inference, support for
multiprocessingn Focus on different stages of deploymentn Set of available tools / third party tool integrationn Multi-GPU training
¤ Target PlatformsnCPUnGPUnTPUnFPGA
28
© Akash Kumar
Deep learning frameworks
¨ How frameworks are different?¤The mechanism of defining the computational
graph: Static and DynamicnThe order of computations that are required to be
performed.¤Static: define-and-run¤Dynamic: define-by-run
29
An example of dynamic graph generation
© Akash Kumar
Deep learning frameworks
¨ TensorFlow example: classification of MNIST dataset
30
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dropout(0.2),tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
Classification accuracy ~98%
Build model (computational graph)
Run model with data
Prepare data
© Akash Kumar
Deep Neural Network Design
¨ Designing deep neural networks is still more art than science¤ Large design space¤ Many architecture solutions for a single problem
¨ Network design procedure¤ Understand problem¤ Evaluate application requirements and resource
limitations¤ Design the architecture¤ Training, validation and reiteration
31
© Akash Kumar
Deep Neural Network Design space
¨ Network design space has many dimensions¤ Network size, depth and width¤ Operator composition¤ Specialized building blocks¤ Optimizations
¨ Recent research focuses on¤ Automated design¤ Guided optimization
32
© Akash Kumar
Common Trade-offs in DNN Design
¨ Accuracy / memory use¨ Accuracy / latency¨ Accuracy / energy consumption¨ Energy consumption / speed
33
© Akash Kumar
Network Optimization Techniques
¨ Pruning¨ Quantization¨ Weight scaling¨ Tensor decomposition
34
© Akash Kumar
Network Pruning
¨ Network pruning is removal of nodes, connections or kernels¤ Can be part of the training – learning both weights
and connections¤ Can be adaptively/selectively applied¤ Benefits may be limited for non-structured pruning
35
© Akash Kumar
Quantization of DNNs
¨ Quantization reduces precision of stored data and operators¤ Reduce overall memory use¤ Compress network, exploiting redundancy¤ Supported for several HW platforms with different
precision levels¤ FP16, INT16, INT8 are most common¤ Training may require full precision
36
© Akash Kumar
Quantization of DNNs
¨ Quantization induces errors in output accuracy¨ In-training quantization
¤ Train with fixed-point low-precision parameters¤ Training heals the quantization-induced errors¤ Example: Binary and Ternary networks
¨ Post-training quantization¤ Fine-tuning is required¤ Intelligent selection of step size ∆
37
© Akash Kumar
Quantization of DNNs
¨ XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (using Alexnet)¤ Extreme quantization of weights and activations¤ Binary-Weight-Networks
n The filters are approximated with binary valuesn Resulting in 32×memory saving
¤ XNOR-Networksn Both the filters and the input to convolutional layers are binaryn Convolutions are approximated with binary operationsn 58× faster convolutional operations and32× memory savings
38
© Akash Kumar
Quantization of DNNs
¨ Linear Quantization
39
∆ = clip( !"#( % )'()*
, 2-./, 2.(-./) )
𝑥12345 = 𝑐𝑙𝑖𝑝(𝑟𝑜𝑢𝑛𝑑(𝑥∆)∆, −2-./, 2-./ − 1)
∆ = 2CDEF(GH23I DHJK ∆ , -./, .-L/)
Calculate step size ∆
Quantize the number 𝑥
© Akash Kumar
Quantization of DNNs
¨ log2-based quantization technique¨ Parameters and activations are represented in
powers of 2¨ Significant memory and power savings can be
obtained¨ Multiplication operation in each neuron is replaced
with shift operator¨ Latency of operations is reduced
40
© Akash Kumar
Quantization of DNNs
¨ VGG16 DNN weights and Biases for two layers
41
Leading one location
© Akash Kumar
Quantization of DNNs
¨ log_2_lead quantization (DATE-2020 Accepted)
¨ Identify the location of leading one in weights and biases
¨ Improve the precision of the quantized number by storing the bits following the leading one
42
© Akash Kumar
log_2_lead Quantization43
Linear Quantization
log_2_lead Quantization
© Akash Kumar
log_2_lead Quantization
¨ ImageNet classification accuracy using VGG16 without fine tuning
44
QuantizationVGG16 Top-5 Top-1Float32 85.74 64.72
Weights, biases quantized
8-bit linear 82.55 59.8Power of 2 0.63 0.1log_2_lead 85.64 64.51
Float32 vs log_2_lead -0.1 -0.21
Weights, biases and activations
quantized
8-bit linear 82.55 59.83Power of 2 7.48 1.16log_2_lead 85.34 64.05
Float32 vs log_2_lead -0.4 -0.67
© Akash Kumar
Summary
¨ Deep neural networks are now everywhere¨ Efficient architectures are necessary to make them
feasible in embedded systems¨ Various quantization schemes applied
45
© Akash Kumar
Questions and Answers
Email: akash.kumar@tu-dresden.de
46
top related