Types of Convolutions,we could very well compare unstructured data (images, text, audio…) to dark matter. It’s sheer amount far exceeds that of all structured data present, but we’ve always had a hard time computing on these datasets to produce meaningful information.
That is until year 2012 when the movie aptly named “2012” was released, the movie spoke about Earth’s destruction, and the very same year with the release of AlexNet, marked the end of Computer Vision as a tool to solve image related problems.
Convolutional Neural Networks (CNNs) have taken over and are now the basis of almost all image processing solutions.
Now that we know that CNNs are an extremely important tool to help us teach computers to figure out that the image displayed before it is a cat or dog, we would very much like to know, what is convolution anyway, let’s find out.
What is Convolution ?
For dealing with images, this definintion of Convoluton is easiest to understand.
The process of extracting features from input data using kernels/filters. Filter moves discretely on top of data-plane/channel, scales the input data equal to the size of it’s receptive field, sums this results and creates a new feature map.
Above figure shows a 3×3 filter used over 4×4 image, to produce 2×2 image, this drop in pixel size occurs due to the lack of padding employed in the original image.
Before we move any further let’s explain the concept of channels which is essential in deep learning.
Concept of Channels
Types of Convolutions,Channels or feature maps can be thought of as an array of pixel values stacked on top of one another to produce the final image. In laymans terms, a basic coloured image that we see has three channels namely red, blue and green, as such a channel may contain any specific information, not necessarily one of the three standard colours. For example, a channel may contain infrared information captured by the camera.
Many differnet types of Convolutions exist, some of them have been mentioned below:
A dilated convolution is one in which the filter convolves over the input, but with pre-defined gaps.
Types of ConvolutionsThese gaps are specified by a parameter called the dilation rate. A way to understand this would be using a 3×3 filter, and a dilation rate of 1 the first index of the filter multiplies with the first element of the feature map, but the second value of the filter skips the next pixel value on the feature map and multiplies with the adjacent one.
A representation of dilated convolution:
The most important advantage gained via dilated convolutions are an increased receptive field, without an increase in computations, which means that the 3×3 dilated filter is able to see more of the image at a single instant, it is better suited for:
- Gradient detection
- Edge detection
Popular machine learning networks that employ dilated convolutions to help them see or hear better are:
- WaveNets — which employs a Convolutional Neural Network to perform text-to-speech synthesis
- ByteNets — which learns time text translation
Depthwise Separable Convolution
Depthwise Convolution is used when we need to save resources while training our model. With a fractional loss in accuracy we are able to save a lot of computation power.
Basically what depthwise convolution does is that it splits the channels of a feature map into different images, and then convolves them with respective filters (also split channel wise), then the output feature maps are combined and convolved with _final_channels_
Let us now take an example to demonstrate how it works, and why does it require far fewer parameters:
Let us take a 20x20x8 image and it is convolved with 32x3x3x8 filters, which results in 18x18x32 as the output shape, so the total parameters required are:
3∗3∗8∗32=23043 * 3 * 8 * 32 = 23043∗3∗8∗32=2304
Now let us consider how this scenario would change for depthwise convolution, we will first convolve 20x20x8 with 8x3x3x1 kernels, and then use 32x1x1x8 kernels to produce a final output of 18x18x32, which is the same as above, total parameters required are:
[8∗3∗3+32∗1∗1∗8]=328[8 * 3 * 3 + 32* 1* 1* 8] = 328[8∗3∗3+32∗1∗1∗8]=328
” A sevenfold reduction in parameters by using this technique. “
References: Towards Data Science: Types of Convolutions
1 x 1 convolution
A 1×1 convolution is a feature merger and a dimensionality reducer.
What the above statement means is that using a 1×1 kernel we are able to merge the extracted features from a 3×3 filter, suppose by the sixth layer, our network has extracted the left eyeball as a feature and the right eyeball as another, and the network as a whole has expanded to exorbitant proportions. What we want is now to reduce the number of learned features in such a way that similar features are merged together (such as a node detecting only eyeballs), and also to reduce the overall channels in the feature maps, we can do that via 1×1 convolutions.
What a 1×1 kernel also offers us is a way to build deeper networks that understand the inputs better without much overhead costs. Interspersing our models with 1×1 convolutions is a very inexpensive way of adding more parameters.
Author : Piyush Daga
A data science fanatic, interested in AI, and sci-fi.