Deep Learning: Neural Net From Scratch Using Java

Artificial Intelligence

Neural Network is the computational design that mimics human learning process and figures out the values of weight and bias associated with data input processing on known datasets and outputs the most probable result on random data input. Neural Net is must effective on pattern recognition and forecasting problems like handwriting recognition, voice recognition, weather forecasting etc( Note that the application field is so broad that it can not be listed out but the point is to get understanding and thinking whether it’s possible to apply on given problem) .

This article is not about describing deep learning, machine learning or even neural network but to build a very simple neural net from scratch. I am going to use Java as a programming language but the idea is same and you can use any language you feel comfortable with, the code may look different but the functioning is same ( not considering language performance ).

Now, to get started, let’s create a java project and inside src folder, create a package np.com.bsubash.nn . This package will contain four classes, Namely : MatrixUtils.java, NeuralLayer.java, NeuralNet.java and one final Main.java where we run training and test on unknown data input.

The MatrixUtils.java class contains all matrix utility functions required for calculation. The class looks like:

package np.com.bsubash.nn;

import java.util.function.Function;

/**
 * @author subash
 * @since Apr 7, 2020
 */
public class MatrixUtils {

	public static double[][] add(double[][] a, double[][] b) {
		if (a.length == 0 || b.length == 0 || a.length != b.length || a[0].length != b[0].length) {
			throw new IllegalArgumentException("Can not add unequal matrices !");
		}
		double[][] result = new double[a.length][a[0].length];
		for (int i = 0; i < a.length; i++) {
			for (int j = 0; j < a[i].length; j++) {
				result[i][j] = a[i][j] + b[i][j];
			}
		}
		return result;
	}

	public static double[][] subtract(double[][] a, double[][] b) {
		if (a.length == 0 || b.length == 0 || a.length != b.length || a[0].length != b[0].length) {
			throw new IllegalArgumentException("Can not subtract unequal matrices !");
		}
		double[][] result = new double[a.length][a[0].length];
		for (int i = 0; i < a.length; i++) {
			for (int j = 0; j < a[i].length; j++) {
				result[i][j] = a[i][j] - b[i][j];
			}
		}
		return result;
	}

	public static double[][] multiply(double[][] a, double[][] b) {
		if (a.length == 0 || b.length == 0 || a[0].length != b.length) {
			throw new IllegalArgumentException("Can not multiply non mxn nxp matrices !");
		}
		double[][] result = new double[a.length][b[0].length];
		for (int i = 0; i < a.length; i++) {
			for (int j = 0; j < b[0].length; j++) {
				double sum = 0;
				for (int k = 0; k < a[0].length; k++) {
					sum += a[i][k] * b[k][j];
				}
				result[i][j] = sum;
			}
		}
		return result;
	}

	public static double[][] scalarMultiply(double[][] v1, double[][] v2) {
		if (v1.length != v2.length) {
			throw new IllegalArgumentException("Cannot multiply vectors of unequal length");
		}
		double result[][] = new double[v1.length][v1[0].length];
		for (int i = 0; i < v1.length; ++i) {
			for (int j = 0; j < v1[i].length; ++j) {
				result[i][j] = v1[i][j] * v2[i][j];
			}
		}
		return result;

	}

	public static double[][] transpose(double[][] matrix) {
		if (matrix.length == 0) {
			throw new IllegalArgumentException("Invalid Matrix !");
		}
		double[][] result = new double[matrix[0].length][matrix.length];
		for (int i = 0; i < matrix.length; i++) {
			for (int j = 0; j < matrix[i].length; j++) {
				result[j][i] = matrix[i][j];
			}
		}
		return result;
	}

	public static double[] normalize(double[] input) {
		double sum = 0.0;
		double[] result = new double[input.length];
		for (double i : input) {
			sum += i;
		}
		for (int i = 0; i < input.length; i++) {
			result[i] = input[i] / sum;
		}
		return result;
	}

	public static double[][] applyActivationFunction(double[][] matrix, Function<Double, Double> activationFunction) {
		if (matrix.length == 0 || matrix[0].length == 0) {
			throw new IllegalArgumentException("Invalid Matrix !");
		}
		double[][] result = new double[matrix.length][matrix[0].length];
		for (int i = 0; i < matrix.length; i++) {
			for (int j = 0; j < matrix[i].length; j++) {
				result[i][j] = activationFunction.apply(matrix[i][j]);
			}
		}
		return result;
	}

	public static String toString(double[][] matrix) {
		String result = "[";
		for (double[] row : matrix) {
			result += "[";
			for (double col : row) {
				result += col + " ";
			}
			result += "]\n";
		}
		result += "]\n";
		return result;
	}

}

Now, we have utility functions/methods available for calculation. Let’s create a NeuralLayer.java class which represents a single layer. It needs to be provided with numberOfNeurons, numberOfInputsPerNeuron, activationFunctionType and initialWeightType . The activationFunctionType provides information about what activation function and its derivative should we use in this layer and initialWeightType tells about whether to use random initial weights or some fixed value. We will be using random initial weight for simplicity and Sigmoid activation function. Neural net will do some calculation on known dataset and find out the adjustment value to the weight. So we will need to have a method to perform adjustment on weights of this layer. The NeuralLayer.java class looks like:

package np.com.bsubash.nn;

import java.util.function.Function;

/**
 * @author subash
 * @since Apr 1, 2020
 */
public class NeuronLayer {
	public Function<Double, Double> activationFunction, activationFunctionDerivative;
	double[][] weights;

	public NeuronLayer(int numberOfNeurons, int numberOfInputsPerNeuron,
			NeuralNet.ActivationFunctionType activationFunctionType, NeuralNet.InitialWeightType initialWeightType) {
		if (activationFunctionType == NeuralNet.ActivationFunctionType.SIGMOID) {
			this.activationFunction = NeuralNet::sigmoid;
			this.activationFunctionDerivative = NeuralNet::sigmoidDerivative;
		}

		// set initial weights
		this.weights = new double[numberOfInputsPerNeuron][numberOfNeurons];
		if (initialWeightType == NeuralNet.InitialWeightType.RANDOM) {
			for (int i = 0; i < numberOfInputsPerNeuron; i++) {
				for (int j = 0; j < numberOfNeurons; j++) {
					weights[i][j] = 2 * Math.random() - 1;
				}
			}
		}
	}

	public void adjustWeight(double[][] adjustment) {
		this.weights = MatrixUtils.add(this.weights, adjustment);
	}

	@Override
	public String toString() {
		return MatrixUtils.toString(this.weights);
	}
}

We created a NeuralLayer, now we need a network class. I have named is NeuralNet.java and this is the class where actual neural net calculation is done. I am using Back Propagation in order to adjust parameter values. As we are building a simple neural net, we will be having only two layer neural net. As a constructor, we need to pass two neural layers and a learning rate. The prototype looks like: (NeuralLayer, NeuralLayer, double) . Now we need a method for training data and a method for predicting. The train method takes input matrix, output matrix and numberOfIteration as argument. For each training example we need to adjust weight values of respective layer. The predict method takes only input matrix as an argument and returns a numeric value. The whole NeuralNet.java class looks like:

package np.com.bsubash.nn;

/**
 * @author subash
 * @since Apr 1, 2020
 */
public class NeuralNet {
	private final NeuronLayer layerOne, layerTwo;
	private final double learningRate;
	private double[][] layerOneOutput, layerTwoOutput;

	public static enum ActivationFunctionType {
		SIGMOID, RELU
	}

	public static enum InitialWeightType {
		RANDOM
	}

	public static double sigmoid(double x) {
		return 1 / (1 + Math.exp(-x));
	}

	public static double sigmoidDerivative(double x) {
		return x * (1 - x);
	}

	public NeuralNet(NeuronLayer layerOne, NeuronLayer layerTwo, double learningRate) {
		this.layerOne = layerOne;
		this.layerTwo = layerTwo;
		this.learningRate = learningRate;
	}

	public void train(double[][] inputs, double[][] outputs, int numberOfIterations) {
		for (int i = 0; i < numberOfIterations; i++) {
			layerOneOutput = MatrixUtils.applyActivationFunction(MatrixUtils.multiply(inputs, layerOne.weights),
					layerOne.activationFunction);
			layerTwoOutput = MatrixUtils.applyActivationFunction(MatrixUtils.multiply(layerOneOutput, layerTwo.weights),
					layerTwo.activationFunction);

			// error for layerTwo
			double[][] errorOfLayerTwo = MatrixUtils.subtract(outputs, layerTwoOutput);
			double[][] gradientOfLayerTwo = MatrixUtils.applyActivationFunction(layerTwoOutput,
					layerTwo.activationFunctionDerivative);
			double[][] deltaLayerTwoError = MatrixUtils.scalarMultiply(errorOfLayerTwo, gradientOfLayerTwo);

			// error for layerOne
			double[][] errorOfLayerOne = MatrixUtils.multiply(deltaLayerTwoError,
					MatrixUtils.transpose(layerTwo.weights));
			double[][] gradientOfLayerOne = MatrixUtils.applyActivationFunction(layerOneOutput,
					layerOne.activationFunctionDerivative);
			double[][] deltaLayerOneError = MatrixUtils.scalarMultiply(errorOfLayerOne, gradientOfLayerOne);

			// adjustment = error.input.gradientCurveDerivative(output)

			double[][] adjustmentLayerTwo = MatrixUtils.multiply(MatrixUtils.transpose(layerOneOutput),
					deltaLayerTwoError);
			double[][] adjustmentLayerOne = MatrixUtils.multiply(MatrixUtils.transpose(inputs), deltaLayerOneError);

			// adjust by learning rate
			adjustmentLayerTwo = MatrixUtils.applyActivationFunction(adjustmentLayerTwo, (x) -> this.learningRate * x);
			adjustmentLayerOne = MatrixUtils.applyActivationFunction(adjustmentLayerOne, (x) -> this.learningRate * x);

			// apply adjustments
			this.layerOne.adjustWeight(adjustmentLayerOne);
			this.layerTwo.adjustWeight(adjustmentLayerTwo);

			System.out.println((i + 1) + "/ " + numberOfIterations + " -> " + errorOfLayerTwo[0][0]);
		}
	}

	public double predict(double[][] input) {
		this.layerOneOutput = MatrixUtils.applyActivationFunction(MatrixUtils.multiply(input, this.layerOne.weights),
				this.layerOne.activationFunction);
		this.layerTwoOutput = MatrixUtils.applyActivationFunction(
				MatrixUtils.multiply(this.layerOneOutput, this.layerTwo.weights), this.layerTwo.activationFunction);
		return this.layerTwoOutput[0][0];
	}

	@Override
	public String toString() {
		String result = "Layer One Weights :";
		result += this.layerOne.toString();
		result += "\nLayer Two Weights :";
		result += this.layerTwo.toString();
		result += "-----------------------------\n";
		if (this.layerOneOutput != null) {
			result += "LAYER ONE OUTPUT:->" + MatrixUtils.toString(this.layerOneOutput) + "\n";
		}
		if (this.layerOneOutput != null) {
			result += "LAYER TWO OUTPUT:->" + MatrixUtils.toString(this.layerTwoOutput) + "\n";
		}
		return result;
	}

}

We are all set to perform test on our neural net. So, create a class named Main.java and create main( public static void main(String[] args){} ) method inside it. Now inside main method, create input matrix along with their output matrix, create two neural layers and finally create neural net object passing these on constructor. Call net.train(trainingInput, trainingOutput, 10000); to train the network and net.predict(new double[][] input) to predict on random data. The Main.java class looks like:

package np.com.bsubash.nn;

/**
 * @author subash
 * @since Apr 9, 2020
 */
public class Main {

	/**
	 * training to figure out the pattern : output is what the first number is
	 * <p>
	 * 0 0 1 -> 0, 1 1 1 -> 1, 1 0 1 -> 1, 0 1 1 -> 0
	 * <p>
	 * 
	 * @param args
	 */
	public static void main(String[] args) {
		double[][] trainingInput = new double[][] { { 0, 0, 1 }, { 1, 1, 1 }, { 1, 0, 1 }, { 0, 1, 1 }, { 0, 0, 0 } };
		double[][] trainingOutput = new double[][] { { 0 }, { 1 }, { 1 }, { 0 }, { 0 } };

		final NeuronLayer hiddenLayer = new NeuronLayer(4, 3, NeuralNet.ActivationFunctionType.SIGMOID,
				NeuralNet.InitialWeightType.RANDOM); // hidden layer having 4 neurons : 3 inputs per neuron
		final NeuronLayer outputLayer = new NeuronLayer(1, 4, NeuralNet.ActivationFunctionType.SIGMOID,
				NeuralNet.InitialWeightType.RANDOM); // output layer having one neuron : 4 inputs per neuron

		final NeuralNet net = new NeuralNet(hiddenLayer, outputLayer, 0.5);
		net.train(trainingInput, trainingOutput, 10000);

		// prediction on unknown data
		System.out.println("PREDICTION ON { 110  -> 1 } : " + net.predict(new double[][] { { 1, 1, 0 } }));
		System.out.println("PREDICTION ON { 010  -> 0 } : " + net.predict(new double[][] { { 0, 1, 0 } }));
	}

}

If you run the Main.java, you will see training information and prediction on console.

Deep Learning: Neural Net From Scratch Using Java

You May Also Like

Popular Posts

News Letter