Neural Network from Scratch Pt. 1

Published on June 3, 2024 at 6:09 PM

Hi, everyone! Today I'm starting a small series on creating a neural network from scratch! After that, I'll dive into more interesting work with applying ML libraries like TensorFlow/PyTorch to interesting problems like reinforcement learning and computer vision. Let's dive in!

What is a neural network?

Neural networks are one of the most fundamental algorithms in machine learning. IBM defines it as:

...a machine learning program, or model, that makes decisions in a manner similar to the human brain[1].

How might it "think" like a human brain, you ask? Well, our brain "thinks" through its neurons, which contain long tails called dendrites[2]. Neuron Neurons send electric pulses to other neurons through their dendrites. Our brains contain roughly 100 billion of these neurons, so its no surprise we're capable of such complex thoughts and actions.
While modern technology can't simulate a human brain, we can utilize the concept of neurons sending information to other neurons to create an algorithm that can "think" and process complex data. The image above is a standard feedforward neural network. Each node (represented by the circles) is called a perceptron[3]. Thus, an alternative name for feedforward neural networks is multi-layer perceptrons.
Perceptrons are simply functions that:

Take multiple inputs $x_1, x_2, \cdots, x_n$
Multiply each input by a weight $w_1, w_2, \cdots, w_n$
Sum the weighted inputs in addition to a constant "bias", $b$ to produce a value $z$
Feed the sum into a linear/non-linear function of some kind to produce a value $y$ The full function is laid out below:

y = f(w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b)

Neural networks are organized into layers of perceptrons, with the output of each perceptron in a previous layer being fed into every perceptron in the next layer. For computational efficiency, perceptron layers and bias constants are represented by column vectors, and the weights between two layers are stored in 2D matrices. Thus, forward propagation of data through a network is nothing but a sequence of matrix multiplications and summations.

Once data is fully propagated through every layer, the last layer is extracted as the output. If the input was an image, and our task was image classification, the output layer would represent a probability distribution of what the network "thinks" the image is. The node with the highest value would be its "guess" for the image's class.

This deep nesting of functions of weighted sums essentially produces one giant function that, given enough parameters, can synthesize any data that can be represented numerically, from simple points on a coordinate plane to images and audio waveforms. However, there's still one big question: how do we find $w$ and $b$ ? I'll answer this question and delve into the more technical side of NNs in the next post.
Until next time!🚀

Sources:
[1] https://www.ibm.com/topics/neural-networks
[2] https://qbi.uq.edu.au/brain/brain-anatomy/what-neuron
[3] https://www.simplilearn.com/tutorials/deep-learning-tutorial/perceptron