Jiuru Lyu
  • Home
  • CV
  • Notes
  • Photograph
  • Blogs

On this page

  • Introduction to the MNIST Dataset and CNNs
  • Convolutional Layer
  • Activation Layers
  • Pooling Layers
  • Fully Connected Layers
  • Architecture Details:
  • Edit this page
  • View source
  • Report an issue

11 Convolutional Neural Networks

Neural Networks
CNNs
Image Processing
Deep Learning
This lecture discusses the architecture and functioning of Convolutional Neural Networks (CNNs), including their layers, operations, and applications in image processing and computer vision. It also covers the concept of pooling layers and their role in reducing dimensionality.
Author

Jiuru Lyu

Published

April 1, 2025

Introduction to the MNIST Dataset and CNNs

  • Each image is a \(32\times32\) grayscale image (\(0-255\)).

    A flat representation of the image: \[\va x=\mqty[x_1,x_2,\dots,x_{1024}]\]

  • Problem with flat representation:

    • Ignore spatial structure
    • Subsceptible to translational error
  • Goal: preserve the spatial structure of the task by capturing relationships among neighboring pixels.

  • Ideas:

    • Laern feature representations based on small patches.
    • Apply patch-based feature representations across the entire image.
  • Building blocks of CNN:

    • Convolutional layers
    • Activation layers
    • Pooling layers
    • Fully connected layers

Convolutional Layer

Figure 1: Covolutional Layer
  • Neurons that maps a \(3\times 3\) patch to a scalar value. \[\va x\cdot\va w+b,\] where \(\va x\) is the image patch, \(\va w\) and \(b\) are filter parameters.
  • Convolution operation:
    • Slide the filter over the image spatially
    • Compute the dot product with different patches of the image.

Example 1 (Padding)  

Figure 2: filter

Often times, it is beneficial to preserve the original image size. This can be done with padding: allow filter to overlap with boundary (zero padding/copy-paste).

Figure 3: padding

Activation Layers

Figure 4: Activation Layer
  • Each filter produces a feature map and a activation map.
  • Multiple filters \(\longrightarrow\) multiple feature maps and activation maps (or channels). Multiple Filters

Pooling Layers

Figure 5: Pooling Layer
  • Downsamples previous layers activation map
  • Cosolidate feature learned at previous stage.
  • Why?
    • Compress/Smooth
    • Spatial invariance
    • Prevent overfitting
  • Pooling often uses simple functions: max or average.
  • Pooling operates over each activation map independently. Pooling Operation

Fully Connected Layers

Figure 6: Fully Connected Layer
  • Flatten the output from previous layer
  • Normal dense fully connected layer

Architecture Details:

  • Input to a covolutional layer: \(C_\text{in}\times H\times W\)
    • \(C_\text{in}\): number of input channels
    • \(H\): height of the input
    • \(W\): width of the input
  • \(C_\text{out}\) (number of output channels) filters of \(h\times w\), where \(h<H\) and \(w<W\) (\(h=w\)).
  • Output: \(C_\text{out}\times H'\times W'\), where \(H'\) and \(W'\) depends on filter size, padding, and stride.
  • Parameter sharing: efficient:
    • Suppose input image \(100\times100\longrightarrow10,000\) input pixels.
    • Fully-connected layer with \(100\) neurons (no bias): \(10,000\times100=1,000,000\) parameters.
    • Convolutional layer with \(100\) filters of size \(3\times 3\) (no bias): \(3\times3\times100=900\) parameters.
Back to top

Created with Quarto.
© Copyright 2025, Jiuru Lyu.
Last updated: 2025 Apr. 29.

 
  • Edit this page
  • View source
  • Report an issue