Dashed shortcuts

From Fig. 3, 34-layer residual, the dashed shortcuts mean that the dimensions of the skip connections is half the dimensions of the output for which it's added to. From a keen observation, it is noted that this is done for the second group of repeating blocks to the last one and not the first one. From the Table 1, we mean the groups conv3_x, conv4_x, and conv5_x as having the dashed shortcuts for the first residual block in the repeating blocks.
Let's first modify for the residual block to account for different dimensions of skip connection and output of stack of layers. This is done by modifying the skip connection to what is known as shortcut projection to compute the 1x1 conv with a stride of 2, followed by a batch norm.
Also, given the shortcut is dashed, then the first layer to the residual block has a stride of 2 instead of 1.

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, 
                kernel_size=3, stride=1, padding=1,
                dashed_shortcut=False):
        super(ResidualBlock, self).__init__()
        scaled = 2 if dashed_shortcut else 1 # added
        self.layer_one = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride*scaled, padding, bias=False),
            nn.BatchNorm2d(out_channels)
        )
        self.relu = nn.ReLU()
        # modified the in_channels to out_channels
        self.layer_two = nn.Sequential(
            nn.Conv2d(out_channels, out_channels, kernel_size, stride, padding, bias=False),
            nn.BatchNorm2d(out_channels)
        )
        # added this part
        self.skip_layer = None
        if dashed_shortcut:
            self.skip_layer = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, 2, padding, bias=False),
                nn.BatchNorm2d(out_channels))
        else:
            self.skip_layer = nn.Identity() # this layer just spits out its input
    def forward(self, x):
        skip_output = self.skip_layer(x) 
        x = self.layer_one(x)
        x = self.relu(x)
        x = self.layer_two(x)
        x += skip_output
        return self.relu(x)

Now, let's implement naively the conv3_x group for the layer-18 ResNet. Note that i use the same input I tested for residual block earlier since I know the dimensions are preserved considering the parameters used for conv2_x.

conv3_x = nn.Sequential(
    ResidualBlock(64, 128, dashed_shortcut=True),
    ResidualBlock(128, 128, dashed_shortcut=False)
)
out = conv3_x(in_feature_map)
print(out.shape)
# torch.Size([1, 128, 28, 28])

And rightly so, the output gotten by the programmatic implementation aligns with the output dimensions for conv3_x group for the layer-18 ResNet in Table 1.
Before we go further into most probably the last section of the course, let's do one more thing. We see that residual blocks can have either two or three layers, depending on the architecture. For instance, from ResNet-50 to the upper architectures, we have three layers in the residual block.
This calls for modifying the custom class for residual block to create variable number of layers. The first thing to do is to pass in a list for the filters and kernel size.

class ResidualBlock(nn.Module):
    def __init__(self, in_block: int, out_channels: list[int], 
               kernel_sizes: list[int], dashed_shortcut=False):
        super(ResidualBlock, self).__init__()
        stride = 2 if dashed_shortcut else 1
        # preserving the in_channels to skip connection
        in_skip = in_block 
        self.layers = nn.ModuleList()
        for f, k in zip(out_channels, kernel_sizes):
            self.layers.extend([
                nn.Conv2d(in_block, f, k, stride, k // 2, bias=False),
                nn.BatchNorm2d(f),
                nn.ReLU()
            ])
            in_block, stride = f, 1 # out of previous layer is input to next, resetting stride to 1
        self.layers = self.layers[:-1] # last ReLU not needed, only done after merge
        self.layers = nn.Sequential(*self.layers)
        self.skip_layer = None
        if dashed_shortcut:
            self.skip_layer = nn.Sequential(
                nn.Conv2d(in_skip, f, 1, 2, 0, bias=False),
                nn.BatchNorm2d(f))
        else:
            self.skip_layer = nn.Identity()

    def forward(self, x):
        skip_output = self.skip_layer(x) 
        x = self.layers(x)
        x += skip_output
        return nn.ReLU()(x) # not weighted, can be called directly

Now let's understand the changes made to the ResidualBlock class:
~in-block defines the number of input channels to a residual block.
~ stride set according to the boolean condition of the dashed_shortcut parameter but then afterward set to 1 for the next layers of the block.
~ the padding value is inferred from the kernel size, i.e. for kernel size 3, the padding is 1, for kernel size 1, the padding is 0. This preserves the spatial dimensions.
~ the padding value is inferred from the kernel size, i.e. for kernel size 3, the padding is 1, for kernel size 1, the padding is 0. This preserves the spatial dimensions.
~ for convenience, we initialize the number of layers by Conv2d, BatchNorm2d, and ReLU but remove the last ReLU to be done after the merge. Note that in forward, there is a ReLU after merge.
~ skip connection stays the same, the parameters are same though initialized differently.

conv3_x = nn.Sequential(
  ResidualBlock(64, [128, 128], [3, 3], dashed_shortcut=True),
  ResidualBlock(128, [128, 128], [3, 3], dashed_shortcut=False)
)

out = conv3_x(in_feature_map)
print(out.shape)

This change clearly shows that the conv_3_x that was defined programmatically had two layers, as opposed to the previous version that had abstracted the inner details of the residual block.
Furthermore, the new block is expandable to, let's say, 50-layer residual block or even 152-layer residual block. Let's see this in action. First, let's define conv3_x for 50-layer resnet as specified in Table 1.

conv3_x_resnet_50 = nn.Sequential(
    ResidualBlock(256, [128, 128, 512], [1, 3, 1], dashed_shortcut=True),
    ResidualBlock(512, [128, 128, 512], [1, 3, 1], dashed_shortcut=False),
    ResidualBlock(512, [128, 128, 512], [1, 3, 1], dashed_shortcut=False),
    ResidualBlock(512, [128, 128, 512], [1, 3, 1], dashed_shortcut=False),
)

in_feature_map = torch.randn(1, 256, 56, 56)
out = conv3_x_resnet_50(in_feature_map)
print(out.shape)
# torch.Size([1, 512, 28, 28])

The first value of the ResidualBlock is the output channel dimensions from the previous batch of residual blocks, conv2_x, which as can be seen, is 256. Then, conv_3_x group has four residual blocks, seen by the x4 beside the defined parameters of the block.
Next, defining the conv5_x of the 152-layer ResNet.

conv5_x_resnet_152 = nn.Sequential(
    ResidualBlock(1024, [512, 512, 2048], [1, 3, 1], dashed_shortcut=True),
    ResidualBlock(2048, [512, 512, 2048], [1, 3, 1], dashed_shortcut=False),
    ResidualBlock(2048, [512, 512, 2048], [1, 3, 1], dashed_shortcut=False),
)

in_feature_map = torch.randn(1, 1024, 14, 14)
out5_x = conv5_x_resnet_152(in_feature_map)
print(out5_x.shape)
# torch.Size([1, 2048, 7, 7])

Awesome learning! In the second last section, let's implement the repeating blocks as a class and then define all the layers of the ResNets, fulfilling the very objective of this course, achieving knowledge and understanding into the concept of residual learning and programmatic implementation of the same.