Review: Keras sails through deep learning

Keras sequential models make deep neural network modeling about as simple as it can be

Review: Keras sails through deep learning
mihtiander / Getty Images
At a Glance

As I discussed in my review of PyTorch, the foundational deep neural network (DNN) frameworks such as TensorFlow (Google) and CNTK (Microsoft) tend to be hard to use for model building. However, TensorFlow now contains three high-level APIs for creating models, one of which, tf.keras, is a bespoke version of Keras.

ed choice plum InfoWorld

Keras proper, a high-level front end for building neural network models, ships with support for three back-end deep learning frameworks: TensorFlow, CNTK, and Theano. Amazon is currently working on developing a MXNet back end for Keras. It’s also possible to use PlaidML (an independent project) as a back end for Keras to take advantage of PlaidML’s OpenCL support for all GPUs.

As an aside, the name Keras is from the Greek for horn, κέρας, and refers to a passage from the Odyssey. The dream spirits that come through the gate made of horn are the ones that announce a true future; the ones that come through the gate made of ivory, ἐλέφας, deceive men with false visions.

TensorFlow is the default back end for Keras, and the one recommended for many use cases involving GPU acceleration on Nvidia hardware via CUDA and cuDNN, as well as for TPU acceleration in the Google Cloud. I used the TensorFlow back end configured for CPU-only to do my basic Keras testing on a MacBook Pro.

Keras vs. PyTorch

Keras (Google) and PyTorch (Facebook) are often mentioned in the same breath, especially when the subject is easy creation of deep neural networks. Both are designed to make it as simple as possible to build models. PyTorch says it’s designed for “fast, flexible experimentation.” Keras “was developed with a focus on enabling fast experimentation.” Both expose Python APIs.

There are some practical differences between the two. While Keras is a front end for three DNN frameworks, PyTorch provides its own back ends, primarily C/C++ code adapted from Torch, with some production features from Caffe2.

Keras has a high-level environment that reduces adding a layer to a neural network to one line of code in its sequential model, and needs one function call each for compiling and training a model. PyTorch model-building code can look very similar if you add layers using its sequential model, but PyTorch requires you to write your own optimization loop for training, as opposed to making a single call in Keras. Frankly, writing that loop isn’t a big deal.

Both Keras and PyTorch let you work at a lower level if you want. Keras calls that level its model or functional API. Keras also allows you to drop down even farther, to the Python coding level, by subclassing keras.Model, but prefers the functional API when possible.

PyTorch claims two distinctions: the ability to change the model dynamically from step to step during training, and the ability to compute gradients using tape-based back-propagation. Keras lacks dynamic modeling, but it does have tape-based gradients, courtesy of the TensorFlow back end’s GradientTape class.

Keras also has a Scikit-learn API, so that you can use the Scikit-learn grid search to perform hyperparameter optimization in Keras models. In a way, that ability can replace the need for PyTorch-like dynamic models, especially if you’re doing your training on multiple GPUs. Essentially, you’re doing the hyperparameter optimizations in parallel training runs instead of within a single training.

Keras simplicity

The 30-second intro to Keras explains that the Keras model, a way to organize layers in a neural network, is the framework’s core data structure. The sequential model is a linear stack of layers, and the layers can be described with one call each. By contrast, describing a layer in TensorFlow takes multiple lines of code.

The code for a simple Keras sequential model might look like this:

import keras
from keras.models import Sequential
from keras.layers import Dense

#Create Sequential model with Dense layers, using the add method
model = Sequential()

#Dense implements the operation:
#        output = activation(dot(input, kernel) + bias)
#Units are the dimensionality of the output space for the layer,
#     which equals the number of hidden units
#Activation and loss functions may be specified by strings or classes
model.add(Dense(units=64, activation=’relu’, input_dim=100))
model.add(Dense(units=10, activation=’softmax’))

#The compile method configures the model’s learning process
model.compile(loss=’categorical_crossentropy’,
              optimizer=’sgd’,
              metrics=[‘accuracy’])

#The fit method does the training in batches
# x_train and y_train are Numpy arrays — just like in the Scikit-Learn API.
model.fit(x_train, y_train, epochs=5, batch_size=32)

#The evaluate method calculates the losses and metrics
#     for the trained model
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128)

#The predict method applies the trained model to inputs
#     to generate outputs
classes = model.predict(x_test, batch_size=128)

To understand that a little better, let’s dive into the architecture.

Keras architecture

As noted above, the model is the core Keras data structure. There are two main types of models available in Keras: the sequential model, and the Model class used with the functional API.

Both sequential and functional models have methods or attributes for layers, inputs, outputs, summary(), get_config(), from_config(config), get_weights(), set_weights(weights), to_json(), to_yaml(), save_weights(), and load_weights(). I won’t dwell on keras.Model subclassing, which doesn’t have all the listed methods and attributes.

Keras sequential models

As I discussed earlier, you can use the model.add() method to add layers to sequential models. You can also list layer instances inside the Sequential() constructor:

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(32, input_shape=(784,)),
    Activation(‘relu’),
    Dense(10),
    Activation(‘softmax’),
])

The first layer in a sequential model normally specifies its input shape or dimension. The other layers get their input shapes from the output of the previous layers; in the code above, the relu activation layer has an input dimension of 32, and the softmax activation layer has an input dimension of 10. There is a mechanism for delayed sequential model building that infers the input shape the first time fit() is called if you don’t specify the shape or dimension; it only seems to be mentioned in the sequential.py source code.

Model compilation configures the learning process. It sets the optimizer, the loss function, and a list of metrics. Keras has a full set of all of these predefined, and calls the back end when appropriate. You can pass string identifiers for these, or instances of the appropriate classes.

Training takes NumPy arrays of input data and labels as input. You normally call the fit() method to run the entire training process, but you can also feed in data batch by batch with the train_on_batch() method. If you need even more control, you can train a model on data from a Python generator function, using the fit_generator() method.

Keras layers

Keras has numerous layers pre-defined, organized into categories: core, convolutional, pooling, locally connected, recurrent, embedding, merge, advanced activations, normalization, and noise. There are also two layer wrappers, for time series generation and bidirectional RNNs, and an API for writing custom layers.

For example, the core layers include Dense, the regular densely-connected neural network layer that does a dot product with optional bias and activation function; Activation, which applies an activation function; Dropout, which randomly drops input units to 0 to prevent overfitting; and several more. Convolutional layers can be 1D (temporal convolution), 2D (spatial convolution), 3D (spatial convolution over volumes), separable, transposed, cropping, upsampling, and so on. In general, layers pass most of the work to the back end (TensorFlow, etc.) where compute-intensive operations such as convolution of large tensors can be optimized with, for example, GPU or TPU support.

Keras functional API

The Keras functional API is useful for creating complex models, such as multi-input/multi-output models, directed acyclic graphs (DAGs), and models with shared layers. The functional API uses the same layers as the sequential model, but provides more flexibility in putting them together. In the functional API you define the layers first, and then create the model, compile it, and fit (train) it.

The functional model that follows takes an input, runs it through two 64-unit Dense layers with ReLU (rectified linear unit) activation, and finally runs it through a 10-unit Dense layer with softmax (normalized exponential function) activation. It could just as easily have been created with a sequential model. The input could be the MNIST data set of handwritten numerals or something else that has 10 classes of 28x28 (784) pixel images.

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation=’relu’)(inputs)
x = Dense(64, activation=’relu’)(x)
predictions = Dense(10, activation=’softmax’)(x)

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=’rmsprop’,
              loss=’categorical_crossentropy’,
              metrics=[‘accuracy’])
model.fit(data, labels)  # starts training

You can do much cooler things with functional models than you can with sequential models, since you can blithely apply models (both the model architecture and the trained weights) to tensors. For example, the code that follows turns the image classification model defined above into a video classification model:

from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps,
# each containing a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# This applies our previous model to every timestep in the input sequences.
# The output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

Installing Keras

Keras installation is basically a two-step process, meaning you have to install a back end as well as Keras. On my MacBook, I started by upgrading pip; then I upgraded TensorFlow and installed Keras, both with pip. I also freshened the source code for both repositories so that I could use the code for reference in areas where the documentation wasn’t complete enough for me.

sudo pip install --upgrade pip
sudo pip install --upgrade tf-nightly
sudo pip install keras

At this point I tested TensorFlow and discovered that I had a rogue copy of the protobuf lying around that kept TensorFlow from importing. It turned out to be a version that I had installed with HomeBrew, so I uninstalled it:

brew uninstall protobuf

Finally TensorFlow imported and worked:

Martins-Retina-MacBook:~ martinheller$ python
Python 2.7.10 (default, Oct  6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import tensorflow as tf
>>> hello = tf.constant(‘Hello, TensorFlow!’)
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!

Keras worked fine once TensorFlow had been repaired. I copied the code for a simple Keras sequential model with random input tensors into a Python REPL a few lines at a time. As you can see from the timings after the model.fit() call, this little five-layer classification network ran quite quickly (~12 ms per epoch after the first epoch) even on a CPU:

Martins-Retina-MacBook:~ martinheller$ python
Python 2.7.10 (default, Oct  6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import keras
Using TensorFlow backend.
>>> from keras.models import Sequential
>>> from keras.layers import Dense, Dropout, Activation
>>> from keras.optimizers import SGD
>>>
>>> import numpy as np
>>> x_train = np.random.random((1000, 20))
>>> y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
>>> x_test = np.random.random((100, 20))
>>> y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
>>>
>>> model = Sequential()
>>> model.add(Dense(64, activation=’relu’, input_dim=20))
>>> model.add(Dropout(0.5))
>>> model.add(Dense(64, activation=’relu’))
>>> model.add(Dropout(0.5))
>>> model.add(Dense(10, activation=’softmax’))
>>>
>>> sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
>>> model.compile(loss=’categorical_crossentropy’,
...               optimizer=sgd,
...               metrics=[‘accuracy’])
>>> model.fit(x_train, y_train,
...           epochs=20,
...           batch_size=128)
Epoch 1/20
1000/1000 [==============================] - 0s 319us/step - loss: 2.3804 - acc: 0.0910
Epoch 2/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.3517 - acc: 0.0860
Epoch 3/20
1000/1000 [==============================] - 0s 13us/step - loss: 2.3573 - acc: 0.1000
Epoch 4/20
1000/1000 [==============================] - 0s 16us/step - loss: 2.3303 - acc: 0.0990
Epoch 5/20
1000/1000 [==============================] - 0s 15us/step - loss: 2.3177 - acc: 0.1090
Epoch 6/20
1000/1000 [==============================] - 0s 11us/step - loss: 2.3180 - acc: 0.1050
Epoch 7/20
1000/1000 [==============================] - 0s 14us/step - loss: 2.3125 - acc: 0.1310
Epoch 8/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.3068 - acc: 0.1330
Epoch 9/20
1000/1000 [==============================] - 0s 11us/step - loss: 2.3066 - acc: 0.0970
Epoch 10/20
1000/1000 [==============================] - 0s 13us/step - loss: 2.2954 - acc: 0.1100
Epoch 11/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.3065 - acc: 0.1110
Epoch 12/20
1000/1000 [==============================] - 0s 11us/step - loss: 2.3057 - acc: 0.1140
Epoch 13/20
1000/1000 [==============================] - 0s 11us/step - loss: 2.2993 - acc: 0.1200
Epoch 14/20
1000/1000 [==============================] - 0s 13us/step - loss: 2.2978 - acc: 0.1240
Epoch 15/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.2989 - acc: 0.1230
Epoch 16/20
1000/1000 [==============================] - 0s 11us/step - loss: 2.3001 - acc: 0.1180
Epoch 17/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.2892 - acc: 0.1210
Epoch 18/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.2993 - acc: 0.1060
Epoch 19/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.2990 - acc: 0.1110
Epoch 20/20
1000/1000 [==============================] - 0s 12us/step - loss: 2.2936 - acc: 0.1200
<keras.callbacks.History object at 0x112847990>
>>> score = model.evaluate(x_test, y_test, batch_size=128)
100/100 [==============================] - 0s 537us/step
>>> print (score)
[‘2.2815165519714355’, ‘0.1599999964237213’]

Here Keras is reporting a categorical cross-entropy loss of 2.28 and an accuracy of 16 percent from the evaluation of the model on the test data. That would be bad, except for the fact that the inputs were random—no model could fit them.

Deploying Keras

Keras models can be deployed across a great range of platforms—perhaps greater than any other deep learning framework. That includes iOS, via CoreML; Android, via the TensorFlow Android runtime; in a browser, via Keras.js and WebDNN; on Google Cloud, via TensorFlow-Serving; in a Python webapp backend; on the JVM, via DL4J model import; and on Raspberry Pi.

Keras applications, data sets, and examples

Keras supplies seven of the common deep learning sample data sets via the keras.datasets class. That includes cifar10 and cifar100 small color images, IMDB movie reviews, Reuters newswire topics, MNIST handwritten digits, MNIST fashion images, and Boston housing prices.

Keras also supplies 10 well-known models pre-trained against ImageNet: Xception, VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, MobileNet, DenseNet, NASNet, and MobileNetV2TK. You can use these models to predict the classification of images, extract features from them, and fine-tune the models on a different set of classes.

By the way, fine-tuning existing models is a good way to speed up training. For example, you can add layers as you wish, freeze the base layers to train the new layers, then unfreeze some of the base layers to fine-tune the training. You can freeze a layer by setting layer.trainable = False.

The Keras examples repository contains more than 40 sample models. They cover vision models, text and sequences, and generative models.

If I were starting a new deep learning project today, I would most likely do the research with Keras. Keras is really about as simple as it could be, given that the hard part of building deep neural network models is finding a network topology that fits the data as accurately as possible without overfitting.

Cost: Free open source under the MIT license.

Platform: Linux, MacOS, Windows, or Raspbian; TensorFlow, Theano, or CNTK back end.

At a Glance
  • Keras is about as simple as a deep neural network framework can be using sequential models. It also offers support of arbitrary topologies through its functional API.

    Pros

    • Sequential models only require one line of code per layer
    • Functional models can be arbitrarily complex
    • Good support for GPUs and TPUs through TensorFlow and other back-ends
    • Good assortment of deployment options

    Cons

    • The separation between Keras and the back-end frameworks can potentially create some issues (though I didn’t encounter any with the default TensorFlow back-end)

Copyright © 2018 IDG Communications, Inc.