Skip to content Skip to sidebar Skip to footer

How To Calculate Dimensions Of First Linear Layer Of A CNN

Currently, I am working with a CNN where there is a fully connected layer attached to it and I am working with a 3 channel image of size 32x32. I am wondering on if there is a cons

Solution 1:

Given the input spatial dimension w, a 2d convolution layer will output a tensor with the following size on this dimension:

int((w + 2*p - d*(k - 1) - 1)/s + 1)

The exact same is true for nn.MaxPool2d. For reference, you can look it up here, on the PyTorch documentation.

The convolution part of your model is made up of three (Conv2d + MaxPool2d) blocks. You can easily infer the spatial dimension size of the output with this helper function:

def conv_shape(x, k=1, p=0, s=1, d=1):
    return int((x + 2*p - d*(k - 1) - 1)/s + 1)

Calling it recursively, you get the resulting spatial dimension:

>>> w = conv_shape(conv_shape(32, k=3, p=1), k=2, s=2)
>>> w = conv_shape(conv_shape(w, k=3), k=2, s=2)
>>> w = conv_shape(conv_shape(w, k=3), k=2, s=2)

>>> w
2

Since your convolutions have squared kernels and identical strides, paddings (horizontal equals vertical), the above calculations hold true for the width and the height dimensions of the tensor. Lastly, looking at the last convolution layer conv3, which has 64 filters, the resulting number of elements per batch element before your fully connected layer is: w*w*64, i.e. 256.


However, nothing stops you from calling your layers to find out the output shape!

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            nn.Flatten())

        n_channels = self.feature_extractor(torch.empty(1, 3, 32, 32)).size(-1)

        self.classifier = nn.Sequential(
            nn.Linear(n_channels, 200),
            nn.ReLU(),
            nn.Dropout(0.25),
            nn.Linear(200, 100),
            nn.ReLU(),
            nn.Dropout(0.25),
            nn.Linear(100, 10))

    def forward(self, x):
        features = self.feature_extractor(x)
        out = self.classifier(features)
        return out

model = Net()

Post a Comment for "How To Calculate Dimensions Of First Linear Layer Of A CNN"