Convolutional Neural Networks

PSTAT197A/CMPSC190DD Fall 2024

Final group assignment

Sign up in a Group (3-5 members). [here]
task: create a method vignette on a data science topic or theme
- goal: create a reference that you or someone else might use as a starting point next term
- deliverable: public repository in the Capstone-24-25 workspace

Your repository should contain:

A brief .README summarizing repo content and listing the best references on your topic for a user to consult after reviewing your vignette if they wish to learn more
A primary vignette document that explains methods and walks through implementation line-by-line (similar to an in-class or lab activity)
At least one example dataset
A script containing commented codes appearing in the vignette

You’ll need to yourself learn about the topic and implementation by finding reference materials and code examples.

It is okay to borrow closely from other vignettes in creating your own, but you should:

It is not okay to make a collage of reference materials by copying verbatim, or simply rewrite an existing vignette.

the best safeguard against this is to find your own data so you’re forced to translate codes/steps to apply in your particular case

Fully-connected nets don’t scale well to (interesting) images. Imagine an image 426 x 426 with a single layer (output size = # of classes, i.e. classes):
- Parameters = 426 x 426 x 10 = 1.9 million
Image as a signal with spatial dependency:
- Image: Two dimensional signal - set of values related to one another in systematic way (Stochastic process).
- Other examples of signals: Speech/music - One dimensional signals

An image filter is a function that takes in a local spatial neighborhood of pixel values and detects the presence of some pattern in that data.

Let \(X\) be the original image, of size \(d\); then pixel \(i\) of the output image is specified by:

\[ Y_i = F \cdot (X_{i-1}, X_i) \] This process of applying the filter to the image to create a new image is called convolution.

If there are \(m\) filters applied to the original image, the size of the output is \(m\) images ( \(m\) channels).

Number of filters: \(m_l\)
Size of one filter: \(k_l \times k_l \times m_{l-1} + 1 \text{ (for the bias value for this one filter).}\)
Stride \(s_l\): The stride determines the spacing at which the filter is applied to the image.
Input tensor size: \(n_{l-1} \times n_{l-1} \times m_{l-1}\)
Padding \(p_l\): Refers to the number of extra pixels (typically with value \(0\)) added around the edges of the input.

Max-pooling is a simple yet powerful operation in CNNs:

Here is the form of a typical convolutional network:

Initial layers: Feature extraction.
- After each filter layer there is generally a ReLU layer; there maybe be multiple filter/ReLU layers and max-pooling layers.
Final layers: Clasification/Regression
- Once the output is down to a relatively small size, there is typically a last fully connected layer, leading into an activation function such as softmax that produces the final output.