prof. dr. akash kumar · 2019-11-11 · convolutional neural networks (using alexnet) ¤ extreme...

Implementation and Optimization of Deep Neural Networks

Prof. Dr. Akash KumarChair for Processor Design

(Some slides adapted from Intelligent Architectures 5LIL0) http://www.es.ele.tue.nl/~heco/courses/IA-5LIL0/index.html

© Akash Kumar

Topics

¨ Introduction ¨ What are DNNs and how do they operate

¤ Convolutional Neural Networks¤ Learning Frameworks¤ Applications using DNNs

¨ Optimizations¤ Making the NW more compact¤ Quantization of activations and weights¤ Exploiting data & weight reuse by advanced loop

transformations and local buffering¨ DNN architectures and accelerators¨ The future: Beyond DNNs

2

© Akash Kumar

What's Deep Learning/ Deep Neural Network?

¨ Self learning algorithms¨ Using huge data sets to learn¨ Deep: many "learning layers"¨ Brain inspired, based on neurons and synapses

(connections)¨ High classification accuracy¨ Many applications; let's look at ImageNet

classification and Tesla Autopilot

3

© Akash Kumar

ImageNet Winners (top-5 classification error)

¨ ImageNet dataset: 10M images, 10000 classes

4

0%

5%

10%

15%

20%

25%

30%

2010 2011 2012 2013 2014 2015 2016 2017

top-

5 er

ror

Traditional methods Deep Learning Human

© Akash Kumar

AI: Tesla Autopilot

¨ Tesla Model S demonstration of autonomous driving¨ Computing system monitors radar and several

cameras¤ Detect objects like cars, and pedestrians¤ Monitor traffic signs¤ Lane tracking and possible lane changing¤ Auto parking

5

Tesla web page: www.tesla.com/videos/ November 2016

http://www.tesla.com/videos/

© Akash Kumar

Deep Learning and High-performance HW Architectures

6

© Akash Kumar

Our Brain

¨ The basic computational unit of the brain is a neuron¤ about 80 Billion neurons in our

brain¤ Neurons are connected with

nearly 1014 – 1015 synapses¤ Neurons receive input signals from

dendrites and produce output signal along axon, which interact with the dendrites of other neurons via synaptic weights

¨ Synaptic weights – learnable & control influence strength

7

© Akash Kumar

Artificial Neuron

¨ An overview: more to follow

8

© Akash Kumar

ANN: Neurons, structured in Layers

¨ Weights represent synaptic strength

9

© Akash Kumar

Deep Neural Networks

¨ An ANN with multiple hidden layers¨ Two main types of DNNs: without memory and with memory¨ Without memory

¤ Fully-Connected NNn feed forward, a.k.a. multilayer perceptron (MLP)

¤ Convolutional NN (CNN)n feed forward, sparsely-connected with weight sharingn note CNNs typically also contain 1 or more fully connected layers

¨ With memory¤ Recurrent NN (RNN)

n feedback¤ Long Short-Term Memory (LSTM)

n feedback + storage

10

© Akash Kumar

Deep Neural Networks11

© Akash Kumar

Artificial Neuron Model12

Activationfunction:NonLinear

Transformation

DotProduct

© Akash Kumar

Neuron Applied to Image Region13

● Neuron detects features in region● Convolution: Same neuron applied to all regions in the

image yields an output feature map

● Input can be taken from multiple input feature maps● Multiple neurons generate multiple output feature maps

© Akash Kumar

2D Convolution, sliding window14

• input 5x5• 3x3 kernel• output 3x3

© Akash Kumar

Convolution in CNNs/DNNs15

¨ N = batch size

¨ C input feature maps of size HxW

¨ M output feature maps of size ExF

¨ M filters of size RxS

© Akash Kumar

Fully Connected (FC) layer

¨ FC can be viewed as a special case of convolution, with:¤ H=R¤ W=S¤ E=F=1

17

Size of input fmaps = size of convolution kernel

Output fmaps have size 1x1, i.e. each output fmap represents 1 output neuron

© Akash Kumar

Activation functions18

© Akash Kumar

Pooling

¨ Reduce resolution ¨ Increase receptive (input) area of outputs¨ Overlapping or

Non-overlapping, depending on stride U

¨ Using the max or average

19

© Akash Kumar

Normalization

¨ Batch normalization¤ normalize activations of a batch such that average =>

0, and sigma => 1¤ based on statistics of training set¤ gives higher accuracy, and faster training

20

© Akash Kumar

Popular networks22

© Akash Kumar

Example: LeNet-5 structure23

[Lecun e.a. Proc. of the IEEE, 1998]

¨ 2 Conv layers

¨ 2 FC layers¨ 60k weights, 341k MACs (mult-acc) per input picture

© Akash Kumar

Inference vs. Training

¨ Training: determine weights¨ 3 types of learning

¤ Supervised: using inputs with labeled outputs¤ Unsupervised¤ Reinforcement

¨ Feedforward + Backward calculations needed

¨ Inference: apply a learned DNN ¤ feedforward: input -> classification

24

© Akash Kumar

Available Deep Learning Software Frameworks

25

© Akash Kumar

Deep learning stack27

TensorFlow Pytorch(Ignite) Keras

CNTK TensorFlow

Caffe2Caffe

Caffe2(Brew)

TensorRT

Tensor Comprehensions

Intel MKL Eigen cuBLAS QNNPACK cuDNN

CPU GPU

High LevelAPI

Libraries

Hardware

InferenceEngine

LowLevel API

• High Level API : Provides abstraction for application use

• Low level API : Integrate system level support libraries and provide DL functionality

• System level support libraries provide efficient kernel implementations of

– Basic linear algebra subprograms (BLAS)

– DNN primitives– GPU kernels

© Akash Kumar

Deep learning frameworks

¨ How frameworks are different?¤ Capabilities : Training, inference, support for

multiprocessingn Focus on different stages of deploymentn Set of available tools / third party tool integrationn Multi-GPU training

¤ Target PlatformsnCPUnGPUnTPUnFPGA

28

© Akash Kumar


¨ How frameworks are different?¤The mechanism of defining the computational

graph: Static and DynamicnThe order of computations that are required to be

performed.¤Static: define-and-run¤Dynamic: define-by-run

29

An example of dynamic graph generation

© Akash Kumar


¨ TensorFlow example: classification of MNIST dataset

30

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dropout(0.2),tf.keras.layers.Dense(10, activation='softmax')

])

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test, verbose=2)

Classification accuracy ~98%

Build model (computational graph)

Run model with data

Prepare data

© Akash Kumar

Deep Neural Network Design

¨ Designing deep neural networks is still more art than science¤ Large design space¤ Many architecture solutions for a single problem

¨ Network design procedure¤ Understand problem¤ Evaluate application requirements and resource

limitations¤ Design the architecture¤ Training, validation and reiteration

31

© Akash Kumar

Deep Neural Network Design space

¨ Network design space has many dimensions¤ Network size, depth and width¤ Operator composition¤ Specialized building blocks¤ Optimizations

¨ Recent research focuses on¤ Automated design¤ Guided optimization

32

© Akash Kumar

Common Trade-offs in DNN Design

¨ Accuracy / memory use¨ Accuracy / latency¨ Accuracy / energy consumption¨ Energy consumption / speed

33

© Akash Kumar

Network Pruning

¨ Network pruning is removal of nodes, connections or kernels¤ Can be part of the training – learning both weights

and connections¤ Can be adaptively/selectively applied¤ Benefits may be limited for non-structured pruning

35

© Akash Kumar

Quantization of DNNs

¨ Quantization reduces precision of stored data and operators¤ Reduce overall memory use¤ Compress network, exploiting redundancy¤ Supported for several HW platforms with different

precision levels¤ FP16, INT16, INT8 are most common¤ Training may require full precision

36

© Akash Kumar


¨ Quantization induces errors in output accuracy¨ In-training quantization

¤ Train with fixed-point low-precision parameters¤ Training heals the quantization-induced errors¤ Example: Binary and Ternary networks

¨ Post-training quantization¤ Fine-tuning is required¤ Intelligent selection of step size ∆

37

© Akash Kumar


¨ XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (using Alexnet)¤ Extreme quantization of weights and activations¤ Binary-Weight-Networks

n The filters are approximated with binary valuesn Resulting in 32×memory saving

¤ XNOR-Networksn Both the filters and the input to convolutional layers are binaryn Convolutions are approximated with binary operationsn 58× faster convolutional operations and32× memory savings

38

© Akash Kumar


¨ Linear Quantization

39

∆ = clip( !"#( % )'()*

, 2-./, 2.(-./) )

𝑥12345 = 𝑐𝑙𝑖𝑝(𝑟𝑜𝑢𝑛𝑑(𝑥∆)∆, −2-./, 2-./ − 1)

∆ = 2CDEF(GH23I DHJK ∆ , -./, .-L/)

Calculate step size ∆

Quantize the number 𝑥

© Akash Kumar


¨ log2-based quantization technique¨ Parameters and activations are represented in

powers of 2¨ Significant memory and power savings can be

obtained¨ Multiplication operation in each neuron is replaced

with shift operator¨ Latency of operations is reduced

40

© Akash Kumar


¨ log_2_lead quantization (DATE-2020 Accepted)

¨ Identify the location of leading one in weights and biases

¨ Improve the precision of the quantized number by storing the bits following the leading one

42

© Akash Kumar

log_2_lead Quantization

¨ ImageNet classification accuracy using VGG16 without fine tuning

44

QuantizationVGG16 Top-5 Top-1Float32 85.74 64.72

Weights, biases quantized

8-bit linear 82.55 59.8Power of 2 0.63 0.1log_2_lead 85.64 64.51

Float32 vs log_2_lead -0.1 -0.21

Weights, biases and activations

quantized

8-bit linear 82.55 59.83Power of 2 7.48 1.16log_2_lead 85.34 64.05

Float32 vs log_2_lead -0.4 -0.67

© Akash Kumar

Summary

¨ Deep neural networks are now everywhere¨ Efficient architectures are necessary to make them

feasible in embedded systems¨ Various quantization schemes applied

45

prof. dr. akash kumar · 2019-11-11 · convolutional neural networks (using alexnet) ¤ extreme...

Documents