ResNet: How One Paper Changed Deep Learning Forever

What you will learn

To understand the history and evolution of computer vision, from its early beginnings to state-of-the-art.

To become familiar with the SuperGradients training library and how deep learning practitioners can use it to shorten the model development lifecycle.

To gain practical skills for developing and training neural networks for image classification tasks.

Throughout the course, students will study various topics, including the fundamental concepts and techniques of computer vision, the design and training of neur

Description

In December of 2015, a paper was published that rocked the deep learning world.

This paper is widely regarded as one of the most influential papers in modern deep learning and has been cited over 110,000 times.

The name of this paper?

Deep Residual Learning for Image Recognition (aka, the ResNet paper).

The prevailing wisdom of the time suggested adding more layers to neural networks would lead to better results.

But researchers observed that the accuracy of deep networks would increase up to a saturation point before levelling off.

In addition to that, an unusual phenomenon was observed: Adding layers to an already deep network, the training error would actually increase.

This was primarily due to two problems:

1) Vanishing/exploding gradients

2) The degradation problem

The vanishing/exploding gradients problem is a by-product of the chain rule.

The chain rule multiplies error gradients for weights in the network.

Multiplying lots of values that are less than one will result in smaller and smaller values.

As those error gradients approach the earlier layers of a network, their value will tend to zero.

This results in smaller and smaller updates to earlier layers (not much learning happening).


Get Instant Notification of New Courses on our Telegram channel.


The inverse problem is the exploding gradient which happens when large error gradients accumulate during training and result in massive updates to model weights in the earlier layers.

The degradation problem is unexpected because it’s not caused by overfitting.

Researchers were finding that as networks got deeper, the training loss would decrease but then shoot back up as more layers were added to the networks.

Which is counterintuitive…

Because you’d expect your training error to decrease, converge, and plateau out as the number of layers in your network increases.

Both of these issues threatened to halt the progress of deep neural networks until this paper came out…

The ResNet paper introduced a novel solution to these two pesky problems that plagued the architects of deep neural networks:

The Skip Connection.

Skip connections, which are housed in residual blocks, allow you to take the activation value from an earlier layer and pass it to a deeper layer in a network.

Skip connections enable deep networks to learn the identity function.

Learning the identity function allows a deeper layer to perform as well as an earlier layer, or at the very least it won’t perform any worse

The result is a smoother gradient flow, ensuring important features are preserved in the training process.

The invention of the skip connection has given us the ability to build deeper and deeper networks while avoiding the problem of vanishing/exploding gradients and degradation.

Wanna learn more about ResNet? Check out this short course that I’ve prepared for you using the SuperGradients training library!

English
language

Content

Introduction

Introduction
Computer Vision Before Deep Learning

Convolutional Neural Networks

Anatomy of Convolutional Neural Networks
Classic CNN Architecture
Vanishing Gradients and The Degradation Problem

ResNet

Skip Connections
ResNet in Action