PyTorch CNN

The Basics and a Quick Tutorial

How Do You Use Convolutional Neural Networks (CNN) in PyTorch?

PyTorch is a Python framework for deep learning that makes it easy to perform research projects, leveraging CPU or GPU hardware. The basic logical unit in PyTorch is a tensor, a multidimensional array. PyTorch combines large numbers of tensors into computational graphs, and uses them to construct, train and run neural network architectures. A unique feature of PyTorch is that graphs are dynamic, written directly in Python, and can be modified during runtime.

Convolutional Neural Networks (CNN) are the basic architecture used in deep learning for computer vision. The Torch.nn library provides built in functions that can create all the building blocks of CNN architectures:

  • Convolution layers
  • Pooling layers
  • Padding layers
  • Activation functions
  • Loss functions
  • Fully connected layers

Related content: if you also work with TensorFlow, read our guide to Tensorflow CNN

In this article, you will learn:

How Do CNNs Work?

A convolutional neural network (CNN for short) is a special type of neural network model primarily designed to process 2D image data, but which can also be used with 1D and 3D data.

At the core of a convolutional neural network are two or more convolutional layers, which perform a mathematical operation called a “convolution”. The convolution multiplies a set of weights with the inputs of the neural network. However, unlike in a regular neural network, this multiplication happens using a “window” that passes over the image, called a filter or kernel. As the filter passes over the image, each time the weights are multiplied by a specific set of input values.

The mathematical operation performed during the convolution operation is a “dot product”. This is an element-wise multiplication between the weights in the filter and the input values. The total is summed, giving a single value for each filter position. This operation is also called a “scalar product”.

Because the filter is usually smaller than the image used as an input, the same weights can be applied to the input multiple times. Specifically, the system applies the filter from right to left and from top to bottom to cover the entire image, with the objective of discovering important features in the image.

It is a powerful idea to constantly apply the same filter to the whole image. If the filter can identify certain features in the image, it reviews the entire image and looks for that feature everywhere. This is called translation invariance—the CNN architecture is mainly interested in the presence of a feature, rather than its specific location.

The values obtained from the convolution operation for each filter position (one value for each filter position) create a two-dimensional matrix of output values, which represent the features extracted from the underlying image. This output matrix is called a “feature map”.

Once the feature map is ready, any value in the functional map can be transmitted nonlinearly to the next convolutional layer (for example, via ReLU activation). The output of the convolutional layers sequence is transmitted to fully connected layers, which produce the final prediction, typically regarding a label describing the image.

Related content: read our guide to deep convolutional neural networks

Quick Tutorial: Building a Basic CNN with PyTorch

The following is abbreviated from the full tutorial by Pulkit Sharma.


First, import PyTorch and required libraries – pandas, imread, numpy, matplotlib, sklearn, and tqdm.

Download the dataset here. It contains a total of 70,000 photos of clothing items. In the training set you’ll find 60,000 images. The other 10,000 images are located in the test set. All images are grayscale and 28X28 pixels in size. Each folder in the dataset contains a .csv file that contains the image ID and its label, and a folder that contains a specific collection of images.

Load the training, testing, and sample submission file from the dataset. The sample submission file provides the format in which predictions will be made by our model.

train = pd.read_csv('train_LbELtWX/train.csv')
test = pd.read_csv('test_ScVgIM0/test.csv')
sample_submission = pd.read_csv('sample_submission_I5njJSF.csv')

Read the images one at a time and load them to an array. Divide image pixels by 255 to get a value between 0 and 1, this will help with model performance optimization.

Now load all the images to a Numpy array called train_x, and their corresponding labels to an array called train_y.

train_x = np.array(train_img)
train_y = train['label'].values

Creating a Validation Set

Store 10% of the images in a validation set, which will be used to evaluate the model at the end, and the rest in the training set.

train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size = 0.1)
(train_x.shape, train_y.shape), (val_x.shape, val_y.shape)

Now we’ll convert the validation and training images into PyTorch format and reshape them into a format the model can use.

train_x = train_x.reshape(54000, 1, 28, 28)
train_x  = torch.from_numpy(train_x)
train_y = train_y.astype(int);
train_y = torch.from_numpy(train_y)
train_x.shape, train_y.shape

val_x = val_x.reshape(6000, 1, 28, 28)
val_x  = torch.from_numpy(val_x)
val_y = val_y.astype(int);
val_y = torch.from_numpy(val_y)
val_x.shape, val_y.shape

Implementing CNNs Using PyTorch

We use a very simple CNN architecture, with only two convolutional layers to extract features from the image. Afterwards we’ll use a fully connected layer to classify the features into labels.

We use the Sequential() function to define the layers of the model in order, from input to final prediction.

Here is how the first 2D convolution layer is defined, together with batch normalization, ReLU activation and max pooling layer:

          Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
           MaxPool2d(kernel_size=2, stride=2),

The second convolution is defined the same, but with “4” in the first argument, which defines the number of input channels (because it needs to accept the output of the previous convolution):

           Conv2d(4, 4, kernel_size=3, stride=1, padding=1),

Here is how we define the fully connected layer:

          Linear(4 * 7 * 7, 10)

We define additional hyperparameters:

  • Adam optimizer with learning rate 0.07
  • Loss function – CrossEntropyLoss

Here are the essential parts of the training function:

   # obtaining validation and training data
   x_train, y_train = Variable(train_x), Variable(train_y)
   x_val, y_val = Variable(val_x), Variable(val_y)

   # generating predictions
   output_train = model(x_train)
   output_val = model(x_val

   # calculating loss
   loss_train = criterion(output_train, y_train)
   loss_val = criterion(output_val, y_val)

   # performing back propagation
   tr_loss = loss_train.item()

   # training the model for certain number of epochs (in this case we
   # will use 25 epochs)
   for epoch in range(n_epochs):

To see the full code for building and training the CNN model, see the full tutorial.

Generating Predictions for the Test Set

Now that the model is trained, here are the general steps for generating predictions from the test set:

  • Load test images
  • Pre-process the test images (similar to what we did for training images above)
  • Generate predictions for the test set, using a Softmax activation function that generates outputs between 0 and 1—these are probabilities that the image belongs to each class label
  • Overwrite labels in the sample submissions file with our predictions

And that’s it! You’re just built a simple CNN model in PyTorch and generated predictions for an unseen set of images. Even with only two convolutional layers, this model is able to achieve accuracy of 71% on test images from the dataset.

PyTorch CNN in Production with Run:AI

Run:AI automates resource management and workload orchestration for deep learning infrastructure. With Run:AI, you can automatically run as many CNN experiments as needed in PyTorch and other deep learning frameworks.

Here are some of the capabilities you gain when using Run:AI:

  • Advanced visibility—create an efficient pipeline of resource sharing by pooling GPU compute resources.
  • No more bottlenecks—you can set up guaranteed quotas of GPU resources, to avoid bottlenecks and optimize billing.
  • A higher level of control—Run:AI enables you to dynamically change resource allocation, ensuring each job gets the resources it needs at any given time.

Run:AI simplifies machine learning infrastructure pipelines, helping data scientists accelerate their productivity and the quality of their models.

Learn more about the Run:AI GPU virtualization platform.