Demystifying Deep Learning, Part 2: Learning and Backpropagation | by Linh K Tran

Now that we’ve got mentioned some core ideas of deep studying, allow us to apply them to categorise the spiral dataset.

We’ll first deal with the single-layer perceptron (SLP) after which lengthen the strategy to a multi-layer perceptron (MLP) with one hidden layer.

Single-Layer Perceptron (SLP)

The output is a [3×1] vector, the place every element represents the likelihood of getting the colour [red, green, purple].

For every coaching information, the output of the community is y_i, a [3×1] vector which should be transformed into chances. For the three colours, we’ve got Ok=3 lessons. The conversion is finished inside the associated fee operate through a softmax function. From y_i, we compute a likelihood vector p_i, choose its k-th element that corresponds to the proper information shade, and take the common over all coaching factors

The price operate for the spiral classification studying case.

the place we’ve got denoted (C_i)_k to be the k-th element of the vector C_i. This is called the cross-entropy value, and we see right here that the selection of the associated fee operate shouldn’t be apparent and is dependent upon the issue.

To compute the gradient of this value, we observe the logic of computerized differentiation. We do it by hand for the mannequin with no hidden layer to know how issues work. We decompose the gradient of the associated fee in line with the chain rule, compute the elementary gradients because of differentiation guidelines and multiply the outcomes collectively. The output of the SLP community is y_i = W*x + b, and computations give, for the weights and biases,

Partial derivatives of the associated fee C_i with respect to the weights and biases for the single-layer perceptron mannequin. The gradient of the full value C is adopted by linearity.

2. Multi-Layer Perceptron (MLP)

For the MLP mannequin with one hidden layer, we should compute the intermediate partial derivatives for the weights and biases of the primary and hidden layers (W⁰, W¹, b⁰, b¹), and likewise account for the by-product of the activation operate of the hidden layer. Although we will reap the benefits of the chain rule and reuse some computations, the derivation turns into fairly cumbersome. We are able to now see why computerized differentiation is so helpful!

As soon as we’ve got the gradient, we will implement the gradient descent iterations inside a loop to replace the weights and biases. Contained in the loop, we compute the output of the community and the associated fee operate, the derivatives of the associated fee, and at last replace the weights and biases.

# Prepare a Linear Classifier# initialize parameters randomly, 
# with right here D=2 (dimension) and Ok=3 (variety of lessons)
W = 0.01 * np.random.randn(D,Ok) # weights
b = np.zeros((1,Ok)) # biases
num_examples = X.form[0] # X are our ~200 information factors
for n in vary(300): # gradient descent loop, 300 iterations
# 1 - output of the single-layer perceptron given the information X
y_i = np.dot(X, W) + b 
# compute the category chances
probs = np.exp(y_i) / np.sum(np.exp(y_i), axis=1, keepdims=True)
# 2 - compute the common cross-entropy value
C_ik = -np.log(probs[range(num_examples),k]) # ok provides the colour of the information
value = np.sum(C_ik)/num_examples
if n % 10 == 0:
print ("iteration %d: loss %f" % (n, value))
# 3 - gradient computation 
# dC/dyi
dscores = probs
dscores[range(num_examples),k] -= 1 # p_i - 1
dscores /= num_examples # common over coaching factors
# backpropagate the gradient to the parameters (W,b)
dW = np.dot(X.T, dscores)
db = np.sum(dscores, axis=0, keepdims=True)
# 4 - gradient descent replace, with studying price eta=1
W -= dW 
b -= db

For the second mannequin, we add the contribution of the hidden layer within the ahead mannequin and the gradient computation. The whole code is offered here.

We let the gradient descent run to acquire our last set of parameters (w, b). We acquire a coaching accuracy of fifty% for the single-layer mannequin SLP and 98% for the multi-layer mannequin MLP, and likewise get the boundary classification from the primary a part of the article 🙂

That’s it! I hope you perceive the fundamentals of coaching a neural community. it depends on a ahead mannequin, a gradient descent algorithm and an correct computation of the gradients of the associated fee operate. This paves the best way to review extra superior ahead propagation fashions, resembling convolutional or recurrent neural networks.

Code:

Assets:

Source link

Teknik Prompt Yang Jelas dan Spesifik — Bagian 2/5 | by trirachmat | Sep, 2024

Building an End-to-End Machine Learning Pipeline with Azure Data Factory | by Kishan Rasikbhai Akbari | Sep, 2024

8 Insights from Working with LLM Recently | by Mr.Data | Sep, 2024

Leave A Reply Cancel Reply

Teknik Prompt Yang Jelas dan Spesifik — Bagian 2/5 | by trirachmat | Sep, 2024

Building an End-to-End Machine Learning Pipeline with Azure Data Factory | by Kishan Rasikbhai Akbari | Sep, 2024

8 Insights from Working with LLM Recently | by Mr.Data | Sep, 2024

Mathematics behind Gradient Boosting for Regression | by Abhishek Jain | Sep, 2024

How Supervised Learning Works: A Simple Explanation | by shagunmistry | Sep, 2024

Most Popular

The Hamas Threat of Hostage Execution Videos Looms Large Over Social Media

Revolutionizing the Way We Find Love

Federal Investigators Widen Tesla Inquiry, Company Says

Our Picks

Teknik Prompt Yang Jelas dan Spesifik — Bagian 2/5 | by trirachmat | Sep, 2024

Building an End-to-End Machine Learning Pipeline with Azure Data Factory | by Kishan Rasikbhai Akbari | Sep, 2024

8 Insights from Working with LLM Recently | by Mr.Data | Sep, 2024

Demystifying Deep Learning, Part 2: Learning and Backpropagation | by Linh K Tran | Sep, 2024

Code:

Assets:

Related Posts

Leave A Reply Cancel Reply