Neural Networks implementation from the ground up part 4 — training on MNIST dataset | by Satvik Nema

In the last blog, we accomplished our implementation of a neural community. On this weblog we are going to check our neural community in opposition to an actual dataset — referred to as MNIST dataset. These are a set of handwritten photographs of digits between 0 and 9, which our community will be taught to categorise.

A number of the examples of how the information is:

Every picture is a 28×28 greyscale picture, with every pixel having values between [0, 255], with 0 as white and 255 as black.

Let’s break this down:

Obtain the dataset and cargo it in java
Preprocess the values in order that the pixels are flattened to a 1d enter matrix and the values are between 0 and 1
Configure the neural community to have 2 hidden layers of 16 neurons every. What about enter and output? Effectively, enter can be 784 neurons (flattened 28 * 28 pixels) and output can be of 10 neurons, the place ith neuron indicating how a lot is the likelihood that the present enter is i.
Repeat the coaching course of for ’N’ epochs after which validate the community with unseen samples.

The dataset is free to obtain here. After downloading and extracting it, we make a MNISTReader class:

public class MnistReader {public static Checklist<Pair<Matrix, Matrix>> getDataForNN(
String imagePath, String labelsPath, int samples) {
attempt {
return getDataForNNHelper(imagePath, labelsPath, samples);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
personal static Checklist<Pair<Matrix, Matrix>> getDataForNNHelper(
String imagesPath, String labelsPath, int samples) throws IOException {
Checklist<Pair<Matrix, Matrix>> information = new ArrayList<>();
attempt (DataInputStream trainingDis =
new DataInputStream(new BufferedInputStream(new FileInputStream(imagesPath)))) {
attempt (DataInputStream labelDis =
new DataInputStream(new BufferedInputStream(new FileInputStream(labelsPath)))) {
int magicNumber = trainingDis.readInt();
int numberOfItems = trainingDis.readInt();
int nRows = trainingDis.readInt();
int nCols = trainingDis.readInt();
int labelMagicNumber = labelDis.readInt();
int numberOfLabels = labelDis.readInt();
numberOfItems = samples == -1 ? numberOfItems : samples;
for (int t = 0; t < numberOfItems; t++) {
double[][] imageContent = new double[nRows][nCols];
for (int i = 0; i < nRows; i++) {
for (int j = 0; j < nCols; j++) {
imageContent[i][j] = trainingDis.readUnsignedByte();
}
}
Matrix imageData =
new Matrix(imageContent)
.apply(pixel -> MathUtils.scaleValue(pixel, 0, 255, 0, 1))
.flatten()
.transpose();
int label = labelDis.readUnsignedByte();
double[] output = new double[10];
output[label] = 1;
Matrix outputMatrix = new Matrix(new double[][] {output}).transpose();
information.add(Pair.of(imageData, outputMatrix));
}
}
}
return information;
}
}

Now as a result of our matrices are immutable and return new matrices after each operation, the pre-processing turns into a one liner:

Matrix imageData =
new Matrix(imageContent)
.apply(pixel -> MathUtils.scaleValue(pixel, 0, 255, 0, 1))
.flatten()
.transpose();

We don’t want a proof right here because the perform names are self explanatory.

We now implement the MnistTrainer class which trains on the loaded enter and adjusts the weights and biases

@Builder
@AllArgsConstructor
@NoArgsConstructor
@Knowledge
public class MnistTrainer {
personal NeuralNetwork neuralNetwork;
personal int iterations;
personal double learningRate;public void prepare(Checklist<Pair<Matrix, Matrix>> trainingData) {
int mod = iterations / 100 == 0 ? 1 : iterations / 100;
double error = 0;
for (int t = 0; t < iterations; t++) {
for (Pair<Matrix, Matrix> trainingDatum : trainingData) {
neuralNetwork.trainForOneInput(trainingDatum, learningRate);
double errorAdditionTerm =
neuralNetwork.getOutputErrorDiff().apply(x -> x * x).sum()
/ trainingData.dimension();
error += errorAdditionTerm;
}
neuralNetwork.setAverageError(error);
if ((t == 0) || ((t + 1) % mod == 0)) {
System.out.println("after " + (t + 1) + " epochs, common error: " + error);
}
error = 0;
trainingData = MathUtils.shuffle(trainingData);
}
}
}

This can be referred to as from the primary methodology.

public class Principal {
public static void predominant(String[] args) throws IOException {
String rootPath = "/Customers/satvik.nema/Paperwork/mnist_dataset/";
String trainImagesPath = rootPath + "train-images.idx3-ubyte";
String trainLabelsPath = rootPath + "train-labels.idx1-ubyte";Checklist<Pair<Matrix, Matrix>> mnistTrainingData =
MnistReader.getDataForNN(trainImagesPath, trainLabelsPath, 60000);
Checklist<Integer> hiddenLayersNeuronsCount = Checklist.of(16, 16);
int inputRows = mnistTrainingData.getFirst().getA().getRows();
int outputRows = mnistTrainingData.getFirst().getB().getRows();
MnistTrainer mnistTrainer =
MnistTrainer.builder()
.neuralNetwork(
NNBuilder.create(inputRows, outputRows, hiddenLayersNeuronsCount))
.iterations(100)
.learningRate(0.01)
.construct();
Immediate begin = Immediate.now();
mnistTrainer.prepare(mnistTrainingData);
Immediate finish = Immediate.now();
lengthy seconds = Length.between(finish, begin).getSeconds();
System.out.println("Time taken for coaching: "+seconds+"s");
}
}

Discover how we set the two hidden layers with 16 neurons in hiddenLayersNeuronsCount

MNIST dataset additionally contains separate 10,000 testing samples. We’ll use them to check how our skilled community performs on unseen information.

Beginning with a MnistTester :

@Builder
@AllArgsConstructor
@NoArgsConstructor
@Knowledge
public class MnistTester implements NeuralNetworkTester {
personal NeuralNetwork neuralNetwork;public double validate(Checklist<Pair<Matrix, Matrix>> trainingData) {
double error = 0;
int countMissed = 0;
Checklist<String> missedIndexes = new ArrayList<>();
int index = 0;
for (Pair<Matrix, Matrix> trainingDatum : trainingData) {
neuralNetwork.feedforward(trainingDatum.getA());
Matrix output = neuralNetwork.getLayerOutputs().getLast();
int predicted = output.max().getB()[0];
int precise = trainingDatum.getB().max().getB()[0];
if (predicted != precise) {
countMissed++;
missedIndexes.add("("+index+", "+precise+", "+predicted+")");
}
Matrix errorMatrix = output.subtract(trainingDatum.getB());
error += errorMatrix.apply(x -> x * x).sum() / trainingData.dimension();
index++;
}
System.out.printf("Whole: %s, incorrect: %spercentn", trainingData.dimension(), countMissed);
return error;
}
}

And working our validation:

String testImagesPath = rootPath + "t10k-images.idx3-ubyte";
String testLabelsPath = rootPath + "t10k-labels.idx1-ubyte";Checklist<Pair<Matrix, Matrix>> mnistTestingData =
MnistReader.getDataForNN(testImagesPath, testLabelsPath, -1);
MnistTester mnistTester = MnistTester.builder().neuralNetwork(trainedNetwork).construct();
double error = mnistTester.validate(mnistTestingData);
System.out.println("Error: "+error);

The accuracy is fairly good

after 1 epochs, common error: 0.6263571412645461
after 100 epochs, common error: 0.07471539583255844
after 200 epochs, common error: 0.060457042431757556
after 300 epochs, common error: 0.052867280710867826
after 400 epochs, common error: 0.04818163691903281
after 500 epochs, common error: 0.04496163434230489
after 600 epochs, common error: 0.04240323875238682
after 700 epochs, common error: 0.04034903547585861
after 800 epochs, common error: 0.03881550591240332
after 900 epochs, common error: 0.037430996099864056
after 1000 epochs, common error: 0.03629978820188779
Time taken for coaching: 4676sTesting:
Whole: 10000, incorrect: 613

So out of the given 10k testing samples, we received solely 613 incorrect! That’s a accuracy of ~93.8%. Not so dangerous for a homegrown neural community is it?

Let’s look at which samples the place incorrect.

For one occasion, this one was alleged to be a 4 which our community categorised as 8: