Pruning, Quantization, Knowledge distillation, Factorization

What you will learn

Practice model compression using Tensorflow, Pytorch, ONNX, and TensorRT

Serve compressed model in AWS Sagemaker

Understand model compression algorithms, pruning, quantization, distillation and factorization

Conduct literature survey about most recent compression techniques

Description

This course is intended to provide learners with an in-depth understanding of techniques used in compressing deep learning models. The techniques covered in the course include pruning, quantization, knowledge distillation, and factorization, all of which are essential for anyone working in the field of deep learning, particularly those focused on computer vision and natural language processing. These techniques should be generally applicable to all deep learning models.

One of the primary objectives of this course is to provide advanced content that is updated with the latest algorithms. This includes product quantization and its variants, tensor factorization, and other cutting-edge techniques that are rapidly evolving in the field of deep learning. To ensure learners are equipped with the knowledge they need to succeed in this field, the course will summarize these techniques based on academic papers, while avoiding an emphasis on experiment result details. It’s worth noting that leaderboard results are updated frequently, and new models may require compression. As a result, the course will focus on the technical aspects of these techniques, helping learners understand what happens behind the scenes.


Get Instant Notification of New Courses on our Telegram channel.


Upon completion of the course, learners will feel confident in their ability to read news, blogs, and academic papers related to model compression. You will be encouraged to apply these techniques to your own work and share the knowledge with others.

English
language

Content

Introduction

A brief introduction about deep learning
Model compression overview
Demo CNN quantization in Tensorflow
Model Size Estimation

Compression Algorithm: pruning

pruning overview
Pruning Overview

Compression Algorithm: quantization

fixed point quantization
Fixed Point Quantization

Compression Algorithm: distillation

knowledge distillation cross entropy
KD Cross Entropy

Compression Algorithm: factorization

SVD factorization
SVD Factorization