TensorFlow Template for Deep Learning Beginners
Updated: Jun 3, 2022
How to Build Your First Deep Neural Network
What is Deep Learning?
Deep learning is a sub-category of machine learning models that uses neural networks. In a nutshell, neural networks connect multiple layers of nodes/neurons and each node can be considered as a mini machine learning model. The output of the model then feeds as the input of the subsequent node.
TensorFlow is a Python library that primarily focuses on providing deep learning framework.
To install and import TensorFlow library:
pip install tensorflow import tensorflow as tf
How to Build a TensorFlow Deep Neural Network?
The skeleton of a deep learning models generally follows the structure below and we can use Keras API to implement a beginner friendly deep learning model. There is a lot of variation we can add at each stage to make the model more complex.
Define the model
Compile the model
Fit the model
Evaluation and Prediction
Grab the code template from our Code Snippet section.
1. Prepare Dataset
Deep learning is fundamentally machine learning algorithms and consists of both supervised learning and unsupervised learning. For a supervised learning, it requires splitting the dataset into train and test set (sometimes also involve a validation set) as below.
from sklearn.model_selection import train_test_split X = df.drop(['user-definedlabeln'], axis=1) y = df['user-definedlabeln'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
There is a lot more you can do with the raw dataset, such as preprocessing and feature engineerings, but let’s keep it simple in this article.
2. Define the Model
One of the the simplest form of deep learning neural network is sequential model, which is composed of a single stack of layers and each layer has only one input tensor and one output tensor. We can create a sequential model by passing multiple Dense layers.
model = tf.keras.models.Sequential([ keras.layers.Dense(units=32, activation="sigmoid"), keras.layers.Dense(units=16, activation="sigmoid"), keras.layers.Dense(units=1, activation="ReLU") ])
number of layers and units: a deep learning model must have an input layer and an output layer. The number of hidden layers between input and output can vary. The number of units per layer is also a hyperparameter that we can experiment on. The article “How to Configure the Number of Layers and Nodes in a Neural Network” provides a guide on how to experiment and perform search to determine number of layers and nodes.
activation function: each layer of the model requires an activation function. You can think of it as a mini statistical model that transforms the node input into output. Activation functions contribute to the non-linearity of neural network algorithm. Hidden layers usually apply the same activation functions and output layer can have a different one depends on the prediction type.
Below are common activation functions and each has its pros and cons.
Sigmoid: ranging from 0 to 1, sigmoid is suitable for binary output
ReLU: it preserves linear behavior and solves the issue of vanishing gradient in Sigmoid and Tanh, but it can suffer from other problems like saturated or “dead” units when the input is negative
Tanh: the stronger gradient makes it more sensitive to small difference, however it has the issue of saturation and slow learning rate at extreme values
Linear: it is suitable for regression problem with continuous numeric output
Recommended Reading: How to Choose an Activation Function for Deep Learning
3. Compile the Model
Deep learning models use backpropogation to learn. Simply put, it learns from the prediction error and adjust the weights allocated to each node in order to minimize the prediction error. At the stage of compiling the model, we need to specify the loss function that is used to measure the error and also the optimizer algorithm to reduce the loss.
from tensorflow.keras.optimizers import RMSprop model.compile(optimizer= RMSprop(learning_rate=0.001), loss="binary_crossentropy", metrics=['accuracy'])
optimizer: optimizer defines the optimization algorithm that is used to refine the models with the aim of reducing error. Some examples of optimizers in a nutshell:
Gradient Descent: it minimizes the loss function by updating the parameters based on gradient of the function
Stochastic Gradient Descent: a popular variant of gradient descent that updates parameters for each training example
RMSProp: it computes adaptive learning rate and it is commonly used in Recurrent Neural Network
Momentum: it borrows the idea from physics where the learning speed adapts based on gradient directions, which results in faster convergence
Adam: it combines the advantage of both RMSProp and Momentum
We also need to specify a learning rate for the optimizer, because it determines the speed of updating the parameters/weights to minimize the result of loss function. We can visualize the loss function in a 2D space as below, and the goal of optimizer is to find the find the minimum error point. If the learning rate is too large, we may skip the lowest point and fail to converge. However, if the learning rate is too small, it may take very long time to reach the minimum loss.
loss function: loss function measures the error rate and provide an evaluation of model performance. Just like we use different evaluation metrics for classification and regression problems in machine learning. It also requires different loss functions in deep learning models.
loss functions for classification problem: “binary_crossentropy”, “hinge” …
loss functions for regression problem: “mean_squared_error”, “mean_squared_logarithmic_error”, “mean_absolute_error”
metrics: it is the evaluation metrics generated after each training iteration. We can stay with the loss functions above, or can also use “accuracy”, “auc” for classification and “rmse”, “cosine” for regression.
Recommended Reading: Keras Loss Functions: Everything You Need to Know
4. Fit the Model
The model.fit() function fits the training dataset X_train and training labels y_train to the model. The complexity of training process is also controlled by the epochs and batch size.
model.fit(X_train, y_train, epochs = 15, batch_size = 10)
epochs: it control the number of iterations of passing the entire training set required to finish the training.
batch_size: it determines how many training samples required to update the model parameters. If the batch_size is the same as the size of training set, the model will use the entire training dataset to update model parameters. If the batch_size = 1, it will use each data point to update the model parameters.
Recommended Reading: Epoch vs Batch Size vs Iterations
5. Evaluation and Prediction
Remember that we initially split the entire dataset to training and testing. And the test set has been left out from the entire model building process. This is because we need to use the holdout test dataset to evaluate its performance on unseen data. Simply pass the testing dataset as below and it returns the model evaluation metrics that have been specified in the model compilation stage.
Once you are happy with the model performance, you can deploy it for making predictions.
The current model can only be considered as a baseline model, and there are still a lot improvements can be done to the enhance its accuracy. This article provides a useful guide to improve deep learning baseline model performance: How To Improve Deep Learning Performance.