prof. dr. akash kumar · 2019-11-11 · convolutional neural networks (using alexnet) ¤ extreme...

Implementation and Optimization of Deep Neural Networks

Prof. Dr. Akash KumarChair for Processor Design

(Some slides adapted from Intelligent Architectures 5LIL0) http://www.es.ele.tue.nl/~heco/courses/IA-5LIL0/index.html

Topics

¨ Introduction ¨ What are DNNs and how do they operate

¤ Convolutional Neural Networks¤ Learning Frameworks¤ Applications using DNNs

¨ Optimizations¤ Making the NW more compact¤ Quantization of activations and weights¤ Exploiting data & weight reuse by advanced loop

transformations and local buffering¨ DNN architectures and accelerators¨ The future: Beyond DNNs

What's Deep Learning/ Deep Neural Network?

¨ Self learning algorithms¨ Using huge data sets to learn¨ Deep: many "learning layers"¨ Brain inspired, based on neurons and synapses

(connections)¨ High classification accuracy¨ Many applications; let's look at ImageNet

classification and Tesla Autopilot

ImageNet Winners (top-5 classification error)

¨ ImageNet dataset: 10M images, 10000 classes

2010 2011 2012 2013 2014 2015 2016 2017

Traditional methods Deep Learning Human

AI: Tesla Autopilot

¨ Tesla Model S demonstration of autonomous driving¨ Computing system monitors radar and several

cameras¤ Detect objects like cars, and pedestrians¤ Monitor traffic signs¤ Lane tracking and possible lane changing¤ Auto parking

Tesla web page: www.tesla.com/videos/ November 2016

Deep Learning and High-performance HW Architectures

Our Brain

¨ The basic computational unit of the brain is a neuron¤ about 80 Billion neurons in our

brain¤ Neurons are connected with

nearly 1014 – 1015 synapses¤ Neurons receive input signals from

dendrites and produce output signal along axon, which interact with the dendrites of other neurons via synaptic weights

¨ Synaptic weights – learnable & control influence strength

Artificial Neuron

¨ An overview: more to follow

ANN: Neurons, structured in Layers

¨ Weights represent synaptic strength

Deep Neural Networks

¨ An ANN with multiple hidden layers¨ Two main types of DNNs: without memory and with memory¨ Without memory

¤ Fully-Connected NNn feed forward, a.k.a. multilayer perceptron (MLP)

¤ Convolutional NN (CNN)n feed forward, sparsely-connected with weight sharingn note CNNs typically also contain 1 or more fully connected layers

¨ With memory¤ Recurrent NN (RNN)

n feedback¤ Long Short-Term Memory (LSTM)

n feedback + storage

Deep Neural Networks11

Artificial Neuron Model12

Activationfunction:NonLinear

Transformation

DotProduct

Neuron Applied to Image Region13

● Neuron detects features in region● Convolution: Same neuron applied to all regions in the

image yields an output feature map

● Input can be taken from multiple input feature maps● Multiple neurons generate multiple output feature maps

2D Convolution, sliding window14

• input 5x5• 3x3 kernel• output 3x3

Convolution in CNNs/DNNs15

¨ N = batch size

¨ C input feature maps of size HxW

¨ M output feature maps of size ExF

¨ M filters of size RxS

Fully Connected (FC) layer

¨ FC can be viewed as a special case of convolution, with:¤ H=R¤ W=S¤ E=F=1

Size of input fmaps = size of convolution kernel

Output fmaps have size 1x1, i.e. each output fmap represents 1 output neuron

Activation functions18

Pooling

¨ Reduce resolution ¨ Increase receptive (input) area of outputs¨ Overlapping or

Non-overlapping, depending on stride U

¨ Using the max or average

Normalization

¨ Batch normalization¤ normalize activations of a batch such that average =>

0, and sigma => 1¤ based on statistics of training set¤ gives higher accuracy, and faster training

Popular networks22

Example: LeNet-5 structure23

[Lecun e.a. Proc. of the IEEE, 1998]

¨ 2 Conv layers

¨ 2 FC layers¨ 60k weights, 341k MACs (mult-acc) per input picture

Inference vs. Training

¨ Training: determine weights¨ 3 types of learning

¤ Supervised: using inputs with labeled outputs¤ Unsupervised¤ Reinforcement

¨ Feedforward + Backward calculations needed

¨ Inference: apply a learned DNN ¤ feedforward: input -> classification

Available Deep Learning Software Frameworks

Deep learning stack27

TensorFlow Pytorch(Ignite) Keras

CNTK TensorFlow

Caffe2Caffe

Caffe2(Brew)

TensorRT

Tensor Comprehensions

Intel MKL Eigen cuBLAS QNNPACK cuDNN

CPU GPU

High LevelAPI

Libraries

Hardware

InferenceEngine

LowLevel API

• High Level API : Provides abstraction for application use

• Low level API : Integrate system level support libraries and provide DL functionality

• System level support libraries provide efficient kernel implementations of

– Basic linear algebra subprograms (BLAS)

– DNN primitives– GPU kernels

Deep learning frameworks

¨ How frameworks are different?¤ Capabilities : Training, inference, support for

multiprocessingn Focus on different stages of deploymentn Set of available tools / third party tool integrationn Multi-GPU training

¤ Target PlatformsnCPUnGPUnTPUnFPGA

¨ How frameworks are different?¤The mechanism of defining the computational

graph: Static and DynamicnThe order of computations that are required to be

performed.¤Static: define-and-run¤Dynamic: define-by-run

An example of dynamic graph generation

¨ TensorFlow example: classification of MNIST dataset

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dropout(0.2),tf.keras.layers.Dense(10, activation='softmax')

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test, y_test, verbose=2)

Classification accuracy ~98%

Build model (computational graph)

Run model with data

Prepare data

Deep Neural Network Design

¨ Designing deep neural networks is still more art than science¤ Large design space¤ Many architecture solutions for a single problem

¨ Network design procedure¤ Understand problem¤ Evaluate application requirements and resource

limitations¤ Design the architecture¤ Training, validation and reiteration

Deep Neural Network Design space

¨ Network design space has many dimensions¤ Network size, depth and width¤ Operator composition¤ Specialized building blocks¤ Optimizations

¨ Recent research focuses on¤ Automated design¤ Guided optimization

Common Trade-offs in DNN Design

¨ Accuracy / memory use¨ Accuracy / latency¨ Accuracy / energy consumption¨ Energy consumption / speed

Network Optimization Techniques

¨ Pruning¨ Quantization¨ Weight scaling¨ Tensor decomposition

Network Pruning

¨ Network pruning is removal of nodes, connections or kernels¤ Can be part of the training – learning both weights

and connections¤ Can be adaptively/selectively applied¤ Benefits may be limited for non-structured pruning

Quantization of DNNs

¨ Quantization reduces precision of stored data and operators¤ Reduce overall memory use¤ Compress network, exploiting redundancy¤ Supported for several HW platforms with different

precision levels¤ FP16, INT16, INT8 are most common¤ Training may require full precision

¨ Quantization induces errors in output accuracy¨ In-training quantization

¤ Train with fixed-point low-precision parameters¤ Training heals the quantization-induced errors¤ Example: Binary and Ternary networks

¨ Post-training quantization¤ Fine-tuning is required¤ Intelligent selection of step size ∆

¨ XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks (using Alexnet)¤ Extreme quantization of weights and activations¤ Binary-Weight-Networks

n The filters are approximated with binary valuesn Resulting in 32×memory saving

¤ XNOR-Networksn Both the filters and the input to convolutional layers are binaryn Convolutions are approximated with binary operationsn 58× faster convolutional operations and32× memory savings

¨ Linear Quantization

∆ = clip( !"#( % )'()*

, 2-./, 2.(-./) )

𝑥12345 = 𝑐𝑙𝑖𝑝(𝑟𝑜𝑢𝑛𝑑(𝑥∆)∆, −2-./, 2-./ − 1)

∆ = 2CDEF(GH23I DHJK ∆ , -./, .-L/)

Calculate step size ∆

Quantize the number 𝑥

¨ log2-based quantization technique¨ Parameters and activations are represented in

powers of 2¨ Significant memory and power savings can be

obtained¨ Multiplication operation in each neuron is replaced

with shift operator¨ Latency of operations is reduced

¨ VGG16 DNN weights and Biases for two layers

Leading one location

¨ log_2_lead quantization (DATE-2020 Accepted)

¨ Identify the location of leading one in weights and biases

¨ Improve the precision of the quantized number by storing the bits following the leading one

log_2_lead Quantization43

Linear Quantization

log_2_lead Quantization

¨ ImageNet classification accuracy using VGG16 without fine tuning

QuantizationVGG16 Top-5 Top-1Float32 85.74 64.72

Weights, biases quantized

8-bit linear 82.55 59.8Power of 2 0.63 0.1log_2_lead 85.64 64.51

Float32 vs log_2_lead -0.1 -0.21

Weights, biases and activations

quantized

8-bit linear 82.55 59.83Power of 2 7.48 1.16log_2_lead 85.34 64.05

Float32 vs log_2_lead -0.4 -0.67

Summary

¨ Deep neural networks are now everywhere¨ Efficient architectures are necessary to make them

feasible in embedded systems¨ Various quantization schemes applied

Questions and Answers

Email: akash.kumar@tu-dresden.de

prof. dr. akash kumar · 2019-11-11 · convolutional neural networks (using alexnet) ¤ extreme...

Documents

an3_derivat.ro_retele-locale_rc ca curs 13 1 satellite...

cobham microwave rf & microwave filters - cobham plc :: home

computer networks prince dudhatra 9724949948

cobham microwave rf & microwave filters

bun kam ki ophisar ha ki ‘weight bridge’ bad ‘check...

force, load, weight sensors

128631703 p style margin 12px auto 6px auto font family...

sol. nephron is related to excretory system. it is the...

choice based credit system...

cobham filters

conditioning of generative adversarial networks ·...