Contemplate a non-linear regression mannequin:
Let’s put in some numbers. Assume we now have just one statement: x=0.7 and y = 2.8. Additionally, assume our preliminary mannequin has parameters a=0.5, b=2, and c=0.1. Then our preliminary prediction (ŷ) is:
Let’s use the sum-of-squares error because the loss perform to judge the mannequin efficiency. For our easy demo with just one statement, the loss perform is:
Given all the data above, a mannequin solver ought to assist us discover one other set of (a, b, c) that provides us a prediction worth (ŷ) with a loss smaller than 0.024.
The prediction perform will be writing because the composite of two features f(x) and g(x):
We will then graphically signify the non-linear prediction mannequin utilizing a neural community mannequin:
The mannequin consists of 1 enter layer, one hidden layer, one output layer, and is evaluated in opposition to a loss perform.
Within the ahead path, we plug in all numbers, together with the enter worth x, and mannequin parameters a, b, and c, and calculate values of all nodes (g and f) and the loss perform.
Notice that because you want all inputs to calculate the output of a node, you possibly can solely resolve the community following the path of arrows (the ahead path).
It price mentioning that even when we don’t know the construction of the later layers within the neural community, we are able to nonetheless calculate its spinoff in opposition to the following node. Take parameter c for instance, we are able to calculate the spinoff of c in opposition to g within the ahead path instantly:
Following the identical concept, we are able to calculate the derivatives for all nodes in the course of the ahead path:
The aim is to search out one other set of parameters (a, b, and c) to lower the loss perform. Due to this fact, of curiosity is the partial derivatives of a, b, and c in opposition to the Loss perform.
Since we’re working with a reasonably merely mannequin, we are able to manually resolve all derivatives:
Allow us to simplify the notations within the chart above for higher visualization:
From the simplified chart, it’s straightforward to see that for every spinoff of curiosity, we solely want:
- The spinoff of the present node in opposition to the following node, which is calculated in the course of the ahead path; and
- the spinoff of the following node in opposition to the loss perform, which is calculated in the course of the backward path.
Usually, let ϕ be the node of curiosity and ξ be the following node that ϕ is pointing to, then we now have:
This concludes the calculation of derivatives of all mannequin parameters.
Outline the mannequin and initialize
Calculate node values and the spinoff of every node in opposition to the following node alongside the ahead path.
Calculate the spinoff of every parameter in opposition to the Loss perform alongside the backward path.
This concludes the backpropagation algorithm, which consists of each the ahead and the backward path, and outputs the derivatives of all mannequin parameters in opposition to the loss perform.
We’ll dive into the pytorch
implementation in one other put up.