logo

CNN Model Activity

Unit 9

The following project was provided as part of Unit 9 and focuses on object recognition tasks using Convolutional Neural Networks (CNNs). The primary objective was to implement the code, analyze the results, and reflect on its components. The methods demonstrated here will be foundational for tasks in later units. You can refer to Unit 11 here for the personalized inplementation of this code for the final project.

Find below relevant snippets of the codes and my reasoning and understanding of each of the components.


Data Exploration

Viewing the Dataset

Exploring the dataset visually was essential for understanding its structure. Each image is 32×32×3, where the dimensions represent width, height, and RGB color channels. This was probably the most complex data dimension we have worked with so far.

# Displaying the first image using IPython display
pic = array_to_img(x_train_all[0])
display(pic)

# Displaying the first image using Matplotlib
plt.imshow(x_train_all[0])

Output


Data Preprocessing

In order for the model to learn from the data provided, we need to ensure that the information is not only in the correct format but also transformed or simplified to allow the model to use it optimally.

Scaling the Input Data

Scaling pixel values to the range [0, 1] ensures numerical stability during training and helps the model converge faster. Raw pixel values range from 0 to 255, so dividing by 255 standardizes them into a more manageable magnitude.

x_train_all = x_train_all / 255.0
x_test = x_test / 255.0

Categorical Encoding of Labels

Since we have 10 classes, converting the labels to categorical format enables the model to compute probabilities for each class during classification. Note the difference between having the output as a magnitude ranging from 0 to 10, versus having 10 labels named 1-10.

y_cat_train_all = to_categorical(y_train_all, 10)
y_cat_test = to_categorical(y_test, 10)

Creating Validation Dataset

Splitting the training data into training and validation subsets ensures that the model can be evaluated on unseen data during training. This approach helps detect overfitting early. We will dive into this concept again during Unit 11.

VALIDATION_SIZE = 10000
x_val = x_train_all[:VALIDATION_SIZE]
y_val_cat = y_cat_train_all[:VALIDATION_SIZE]

x_train = x_train_all[VALIDATION_SIZE:]
y_cat_train = y_cat_train_all[VALIDATION_SIZE:]

Model Building

Creating the CNN Model

The proposed architecture consists of two convolutional layers, each followed by max-pooling, to capture spatial hierarchies. A dense layer with 256 neurons is added for representation learning, followed by a softmax layer for multi-class classification.

model = Sequential()

# First Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=(4,4), input_shape=(32, 32, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

# Second Convolutional Layer
model.add(Conv2D(filters=32, kernel_size=(4,4), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

# Flattening and Dense Layers
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compiling the Model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

For this particular activity I left the model as is, and really focused on the dimensionallity change across layers, as seen below.

Model Summary

Layer (type)Output ShapeParam #
conv2d (Conv2D)(None, 29, 29, 32)1,568
max_pooling2d (MaxPooling2D)(None, 14, 14, 32)0
conv2d_1 (Conv2D)(None, 11, 11, 32)16,416
max_pooling2d_1 (MaxPooling2D)(None, 5, 5, 32)0
flatten (Flatten)(None, 800)0
dense (Dense)(None, 256)205,056
dense_1 (Dense)(None, 10)2,570

As we can see, each of the convolutional layers are reducing the dimensionallity, as we are no forcing them to keep the original size. Equally is done by the max-pooling layers which simplify the output of the convolutional layers to provide a high level overview of the learned features. The dense layers converge all the information into a one-dimensional vector which is eventually reduced to size 10, with a probability for each of the classes.


Training the Model

Early Stopping

During training, I would like to highlight the Early Stopping mechanism. It monitors the validation loss during training and stops the process if no improvement is observed for a specified number of epochs. This prevents overfitting and saves computational resources. In this particular case, if the validation loss did not improve for two consecutive epochs, the trainign is stoped.

from tensorflow.keras.callbacks import EarlyStopping

# Setting up Early Stopping
early_stop = EarlyStopping(monitor='val_loss', patience=2)

# Training the Model
history = model.fit(x_train, y_cat_train, epochs=25, validation_data=(x_val, y_val_cat), callbacks=[early_stop])

Training and Validation Metrics

The main take away from training was to really visualize how to losses decreased for both the datasets and when the model decided to stop. In the following plot we can see the results for higher patience values. Where the validation loss did not improve and the model began overfitting.

Output


Model Evaluation

Evaluating on the test set provides a realistic measure of how the model performs on unseen data.

Classification Report and Confusion Matrix

The classification report includes precision, recall, and F1-score, providing a detailed view of the model’s performance for each class.

ClassPrecisionRecallF1-scoreSupport
00.790.770.781000
10.840.890.871000
20.720.660.691000
30.550.590.571000
40.750.700.731000
50.640.670.661000
60.820.840.831000
70.860.800.831000
80.810.890.851000
90.830.810.821000
Accuracy0.7610000
Macro avg0.760.760.7610000
Weighted avg0.760.760.7610000

The confusion matrix visualizes correct and incorrect predictions.

0123456789
0765214418175398731
1138891622412656
259665671605546181910
31416495915716553231715
42255376704414936113
510241176396681625815
66228711823836295
7122263640648796412
83827104163088823
932846184711129808

Predicting on Single Image

Visualizing individual predictions allows us to verify the model’s accuracy for specific examples. This step is especially interesting to visualize the results in a visual manner.

from random import randint

idx = randint(0, len(x_test)-1)

test_image = x_test[idx]

plt.imshow(test_image)
plt.show()

print(f"Real Label: {CLASS_NAMES[y_test_multiclass[idx]]}")
print(f"Predicted Label: {CLASS_NAMES[predictions[idx]]}")

Output

Real Label: Frog
Predicted Label: Frog

Overall this was an excelent activity to really grasp on the concepts of neural networks and visualize the results of a simple use case.