Pruning, Quantization, Knowledge distillation, Factorization
What you will learn
Practice model compression using Tensorflow, Pytorch, ONNX, and TensorRT
Serve compressed model in AWS Sagemaker
Understand model compression algorithms, pruning, quantization, distillation and factorization
Conduct literature survey about most recent compression techniques
Description
This course is intended to provide learners with an in-depth understanding of techniques used in compressing deep learning models. The techniques covered in the course include pruning, quantization, knowledge distillation, and factorization, all of which are essential for anyone working in the field of deep learning, particularly those focused on computer vision and natural language processing. These techniques should be generally applicable to all deep learning models.
One of the primary objectives of this course is to provide advanced content that is updated with the latest algorithms. This includes product quantization and its variants, tensor factorization, and other cutting-edge techniques that are rapidly evolving in the field of deep learning. To ensure learners are equipped with the knowledge they need to succeed in this field, the course will summarize these techniques based on academic papers, while avoiding an emphasis on experiment result details. It’s worth noting that leaderboard results are updated frequently, and new models may require compression. As a result, the course will focus on the technical aspects of these techniques, helping learners understand what happens behind the scenes.
Upon completion of the course, learners will feel confident in their ability to read news, blogs, and academic papers related to model compression. You will be encouraged to apply these techniques to your own work and share the knowledge with others.
Content