This blog starts a series of articles talking about the deep learning for non-Machine Learning engineers who are interested in quickly adapting deep learning into real-world products. Usually, those engineers may not have background, time or resources to dig into a real machine learning world full of mathematics and algorithms from pattern recognition, information theory, etc. Therefore, I will try to avoid deep knowledge in mathematics part but provide just enough information to get going. Of course, if any one who else can find it helpful, I will be very happy. When I started to learn about deep learning, I found myself in a jungle with lots of information from difference sources. They mostly start with mathematical details without knowing how deep you are going and not jumping quick enough to a real-world use case. I hope I can start with a different way and learn to build something meaningful in a shorter path.

### Getting Started with Deep Learning

Before staring developing a product with deep learning, you may have a lots of questions. Like:

- There are so many different kind of neural network, which one should I use?
- There are so many different kind of deep learning framework, which one should I use?
- There are more network beyond feed-forward neural network like RNN, SVM, LSTM, reinforce learning, GAN? What matter most to me?
- There are many existing training dataset to use, when should I train my own data?
- There are limited but increasing embedded hardware for deep learning, how should I start from if I want to build on embedded hardware?

I will try not to provide all the answers for you immediately. I hope you to evolve your own answer by following through the learning process, which may not be easy. But I will try to provide the very brief summary for most use terms to help you pinpoint to the interest topic sooner.

Before you get your hands dirty with source codes, I suggest you have the basic knowledge about the following. Of course, you can skim through them and learn details later. But soon or later, you may have to spend some quality time there.

**Python**: most deep learning framework provide high level API in python. You can easily write few lines (<10) of codes and perform serious DL work running for a few hours. It’s highly recommend to use virtualenv as your development environment, so you can create multiple virtual environment for different DL framework in order to prevent dependency conflict.**Numpy**: a Python library help construct N-dimensional array and perform math operations in N-dimensional form. If you are not familiar with N-dimension data processing, it’s time to get familiar with numpy since there will be a lots of matrix manipulation when we are dealing with various data in different formats.**TensorFlow**: an open-source software library for machine learning across a range of tasks, and developed by Google for systems capable of building and training neural networks to detect and decipher patterns and correlations. I tried to use Caffe but the model building seems to scare me away. The tutorials for TensorFlow are easy to follow to me. But you can choose other framework such as Caffe, MXNet, Theano, etc.

In order to fully understand deep learning, we still need to start with the basics.

**What is Deep Learning?** According to Wikipedia, Deep Learning is “the application of **artificial neural networks (ANNs)** to learning tasks that contain **more than one hidden layer**. Deep learning is part of a broader family of **machine learning** methods based on **learning** data representations, as opposed to task-specific algorithms. Learning can be **supervised**, **partially supervised** or **unsupervised**.” As this article is targeted at non-machine learning engineers, we are mainly focusing on the simplier **supervised neural network**, especially the **classification** since it is one of the common application that people need to determine the output class from given input data by **training** a network with lots of existing input, output pairs. Once the network (or model) is trained, it can be applied, usually called **inference**, repetitively by machine itself without further help.

**Linear regression**is an algorithm to linearly estimate or predict certain output based on the input data from a series of known input/output pairs. You can imagine we got some known housing prices (Y) in those areas specified with their zip codes (X). The linear regression is to find a good estimation of housing price Y based on a given X (an enumeration from zip code) in the form of Y=w*X + b, where w and b are regression coefficients. Once we can find a good w and b pair. When we need to estimate the housing price on a given zip code, we can simply use a single equation to get an estimation or prediction easily. This is basically the training and inference are all about.

**Training**is the process of finding a good w and b based on a known labeled data (Y and X) using the algorithms like linear regression, of course, there are other algorithms can can be applied on different places for different purpose such as

**logistic regression,**

**soft-max**, etc. We can see how they are applied later.

**Inference**is basically the prediction part to predict the output (eg: housing price) from the new input (zip code) with the trained model (w and b).

If we are looking at the python codes using TensorFlow, the model construction codes looks like this:

# tf Graph Input: 1-D array of n_samples

X = tf.placeholder(“float”)

Y = tf.placeholder(“float”)

# Set model weights: scaler in this example

W = tf.Variable(rng.randn(), name=”weight”)

b = tf.Variable(rng.randn(), name=”bias”)

# Construct a linear model: Y_pred = X * W + b

pred = tf.add(tf.multiply(X, W), b)

# Mean squared error = [summation of all (pred – Y)^2] / (2 * n_samples)

cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)

# Gradient descent using TensorFlow build-in optimizer

# Note, minimize() knows to modify W and b because Variable objects are trainable=True by default

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

Once we know what is training and inferencing, we can start looking at the network. In order to understand what is “**more than one hidden layer**“, we can start looking at what is layer and what the simplest form, the single layer neural network, is all about.

**What is “Single-Layer Neural Network?** Sebastian Raschka has a great article talking about “Single-Layer Neural Networks and Gradient Descent“. It’s basically consists of a algorithm called **Perceptron** that is used for supervised learning with binary classifier, which just gives you a true or false output. As you can see from below, the perceptron takes take input vector X, multiply by W, weights matrix, followed by a summation (net input function) and an **activation function** (a unit step function in this case), which provides either +1 or -1 as output. Intuitively, it provides a binary classifier to tell if the giving input X has a certain feature (determining by weights) or not. As a software engineer, let’s not prove it how this algorithms works or not mathematically. If you are interested, of course you can read the article from Raschka. If you just use your instinct, do you think we can use this simple network to identify an image has a dog, cat or something else? Probably not! But this indeed the building block of the deep neural network that can do very complex decision making applications by **adding more layers**, using **different training algorithms**, etc. Just give you an idea, Microsoft used 152 layers in the Residual Network to win the first place of ILSVRC in 2015.

Let’s take it step-by-step and look at a little more complex network called Multiplayer Perception (MLP) next.

**What is Multilayer Perception (MLP)?** According to Wikipedia, “A multilayer perceptron (MLP) is a class of **feedforward artificial neural network**. An MLP consists of at least **three layers of nodes**. Except for the input nodes, each node is a neuron that uses a **nonlinear activation function**. MLP utilizes a **supervised learning technique** called **backpropagation** for training. Its** multiple layers** and **non-linear activation** distinguish MLP from a **linear perceptron**. It can distinguish data that is **not linearly separable**.”^{}

You can see the MLP network as shown above. There are a few keywords in the above description. The **feedforward neural networks** means the arrow is always from input to output in the forward direction. There is no arrow pointing backward as we can see in **recurrent neural networks (RNN)**. We will talk about it later.

As you can see from above diagram, the units in the hidden layer is **fully connected** to the input layer and the units in output layer is **fully connected** to the hidden layer. In MLP, the input layer does nothing but taking input data and pass along to the hidden layer. The unit in the hidden layer takes the input from each node of the input layer, applies the weighted summation, then goes through activation function as in perceptron. However, as the definition from Wikipedia, we need nonlinear activation functions such as **sigmoid (logistic) activation function**, which is an S-shaped curve that maps the net input data to a logistic distribution in the range 0 and 1. Besides the logistic activation function, there are two other popular activation functions, the **hyperbolic tangent function** and **ReLU (Rectified Linear Unit) function**, which are **more popular use in the hidden layer** since there is no squeezing effect you meet on backpropagated errors from the sigmoid function[4].

When MLP is used for **classification**, we would like to know how confidence does the input data belong to a certain output class. We can modify the output layer by replacing the unit-step activation function in perceptron by a **shared soft-max function**, so the output of each neuron will correspond to the **estimated probability** of the corresponding class. That’s exactly what we want for the **classification with confidence level**. Therefore, soft-max is typically use in the** output layer as a classifier** in the deep neural network.

The code snippet for construct a TensorFlow model is like following. As you can see we use ReLu for the hidden layer and softmax for the output layer here.

# Define weights and biases variables for all layers

weights = {

‘h1’: tf.Variable(tf.random_normal([n_input, n_hidden_1])),

‘out’: tf.Variable(tf.random_normal([n_hidden_2, n_classes]))

}

biases = {

‘b1’: tf.Variable(tf.random_normal([n_hidden_1])),

‘out’: tf.Variable(tf.random_normal([n_classes]))

}

# Define Hidden layer with ReLu activation function

hidden_layer = tf.add(tf.matmul(x, weights[‘h1’]), biases[‘b1’])

hidden_layer = tf.nn.relu(hidden_layer)

# Output layer with linear activation

out_layer = tf.matmul(hidden_layer, weights[‘out’]) + biases[‘out’]

# Define loss and optimizer

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=y))

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

### What’s Next?

We will talk about more complex neural network including convolution neural network, deep neural network, recurrent network and others. I will start with the concept and try to provide some real world experiments and performance numbers. Hope you like it.

### Reference:

- http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html
- http://cs231n.github.io/
- https://github.com/aymericdamien/TensorFlow-Examples
- https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions
- https://gist.github.com/vinhkhuc/e53a70f9e5c3f55852b0
- https://deeplearning4j.org/recurrentnetwork

*
Also published on Medium. *

## Leave a Reply