Computer Vision: Homework 2

Starting from:

$30

EECS 442 Computer Vision: Homework 2
Instructions

• The submission includes two parts:
1. To Canvas: submit a zip file of all of your code.
We have indicated questions where you have to do something in code in red.
Your zip file should contain a single directory which has the same name as your uniqname. If I
(David, uniqname fouhey) were submitting my code, the zip file should contain a single folder
fouhey/ containing all required files.
What should I submit? At the end of the homework, there is a canvas submission checklist
provided. We provide a script that validates the submission format here. If we don’t ask you for
it, you don’t need to submit it; while you should clean up the directory, don’t panic about having
an extra file or two.
2. To Gradescope: submit a pdf file as your write-up, including your answers to all the questions
and key choices you made.
We have indicated questions where you have to do something in the report in blue.
You might like to combine several files to make a submission. Here is an example online link
for combining multiple PDF files: https://combinepdf.com/.
The write-up must be an electronic version. No handwriting, including plotting questions.
LATEX is recommended but not mandatory.
Python Environment We are using Python 3.7 for this course. You can find references for the Python
standard library here: https://docs.python.org/3.7/library/index.html. To make your life easier, we recommend you to install Anaconda for Python 3.7.x (https://www.anaconda.com/download/). This is a Python
package manager that includes most of the modules you need for this course.
We will make use of the following packages extensively in this course:
• Numpy (https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
• SciPy (https://scipy.org/)
• Matplotlib (http://matplotlib.org/users/pyplot tutorial.html)
1
1 Patches [8 pts]
Task 1: Image Patches (8 pts) A patch is a small piece of an image. Sometimes we will focus on the
patches of an image instead of operating on the entire image itself.
(a) Complete the function image patches in filters.py. This should divide a grayscale image
into a set of non-overlapping 16 by 16 pixel image patches. Normalize each patch to have zero mean
and unit variance.
Plot and put in your report three 16x16 image patches from grace hopper.png loaded in grayscale.
(3 pts)
(b) Discuss in your report why you think it is good for the patches to have zero mean. (2 pts)
Hint: Suppose you want to measure the similarity between patches by computing the dot products
between different patches to find a match. Think about how the patch values and the resulting similarity
obtained by taking dot products would be affected under different lighting/illumination conditions. Say
in one case a value of dark corresponds to 0 whereas bright corresponds to 1. In another scenario a value
of dark corresponds to -1 whereas bright corresponds to 1. Which one would be more appropriate to
measure similarity using dot products?
(c) Early work in computer vision used patches as descriptions of local image content for applications
ranging from image alignment and stitching to object classification and detection.
Discuss in your report in 2-3 sentences, why the patches from the previous question would be good or
bad for things like matching or recognizing an object. Consider how those patches would look like if
we changed the object’s pose, scale, illumination, etc. (3 pts)
2 Image Filtering [43 pts]
Foreword: There’s a difference between convolution and cross-correlation: in cross-correlation, you compute the dot product (i.e., np.sum(F*I[y1:y2,x1:x2])) between the kernel/filter and each window/patch in the image; in convolution, you compute the dot product between the flipped kernel/filter and
each window/patch in the image. We’d like to insulate you from this annoying distinction, but we also don’t
want to teach you the wrong stuff. So we’ll split the difference by pointing where you have to pay attention.
We’ll make this more precise in 1D: assume the input/signal f has N elements (i.e., is indexed by i for
0 ≤ i < N) and the filter/kernel g has M elements (i.e., is indexed by j for 0 ≤ j < M). In all cases below,
you can assume zero-padding of the input/signal f.
1D Cross-correlation/Filtering: The examples given in class and what most people think of when it comes
to filtering. Specifically, 1D cross-correlation/filtering takes the form:
h[i] =
M
X−1
j=0
g[j]f[i + j], (1)
or each entry i is the sum of all the products between the filter at j and the input at i + j for all valid j. If
you want to think of doing this in terms of matrix products, you can think of this as hi = g
T
fi:i+M−1. Of
the two options, this tends to be more intuitive to most people.
2
1D Convolution: When we do 1D convolution, on the other hand, we re-order the filter last-to-first, and
then do filtering. In signal processing, this is usually reasoned about by index trickery. By definition, 1D
convolution takes the form:
(f ∗ g)[i] =
M
X−1
j=0
g[M − j − 1]f[i + j], (2)
which is uglier since we start at 0 rather than 1. You can verify that as j goes 0 → (M − 1), the new index
(M − j − 1) goes (M − 1) → 0. Rather than deal with annoying indices, if you’re given a filter to apply
and asked to do convolution, you can simply do the following: (1) at the start of the function and only once,
compute g = g[::-1] if it’s 1D or G = G[::-1,::-1]; (2) do filtering with this flipped filter.
The reason for the fuss is that convolution is commutative (f ∗ g = g ∗ f) and associative (f ∗ (g ∗ h) =
(f ∗ g) ∗ h). As you chain filters together, its is nice to know things like that (a ∗ b) ∗ c = (c ∗ a) ∗ b for all
a, b, c. Cross-correlation/filtering does not satisfy these properties.
You should watch for this in three crucial places.
• When implementing convolution in Task 2(b) in the function convolve() in filters.py.
• When dealing with non-symmetric filters (like directional derivatives [−1, 0, 1]). A symmetric filter
like the Gaussian is unaffected by the distinction because if you flip it horizontally/vertically, it’s
the same. But for asymmetric filters, you can get different results. In the case of the directional
derivatives, this flips the sign. This can produce outputs that have flipped signs or give you answers
to question that are nearly right but need to be multiplied by −1.
Bottom-line: if you have something that’s right except for a -1, and you’re using a directional derivative, then you’ve done it with cross-correlation.
• In a later homework where you implement a convolutional neural network. Despite their name, these
networks actually do cross-correlation. Argh! It’s annoying.
Here’s my key: if you’re trying to produce a picture that looks clean and noise-free or you’re talking
to someone who talks about sampling rates, “convolution” is the kind of convolution where you reverse the filter order. If you’re trying to recognize puppies or you’re talking to someone who doesn’t
frequently say signal, then “convolution” is almost certainly filtering/cross-correlation.
Task 2: Convolution and Gaussian Filter (24 pts)
(a) A Gaussian filter has filter values that follow the Gaussian probability distribution. Specifically, the
values of the filter are
1D kernel : G(x) = 1
√
2πσ2
exp
−
x
2
2σ
2

2D kernel : G(x, y) = 1
2πσ2
exp
−
x
2 + y
2
2σ
2

where 0 is the center of the filter (in both 1D and 2D) and σ is a free parameter that controls how much
blurring takes place. One thing that makes lots of operations fast is that applying a 2D Gaussian filter to
an image can be done by applying two 1D Gaussian filters, one vertical and the other horizontal.
Show in your report that a convolution by a 2D Gaussian filter is equivalent to sequentially applying a
vertical and horizontal Gaussian filter. Specify in your report the relationship between the 2D and 1D
filter, especially the relationship between their variances. (6 pts)
Advice: Pick a particular filter size k. From there, define a 2D Gaussian filter G ∈ R
k×k
and two
Gaussian filters Gy ∈ R
k×1
and Gx ∈ R
1×k
. A useful fact that you can use is that for any k, any
3
vertical filter X ∈ R
k×1
and any horizontal filter Y ∈ R
1×k
, the convolution X ∗ Y is equal to
XY. You may find it particularly useful to use the fact that Y ∗ YT = YYT
and that convolution is
commutative. Look at individual elements.
(b) Complete the function convolve() in filters.py. Be sure to implement convolution and not
cross-correlation/filtering (i.e., flip the kernel as soon as you get it). For consistency purposes, please
use zero-padding when implementing convolution. (4 pts)
Advice: You can use scipy.ndimage.convolve() to check your implementation. For zeropadding use mode=‘constant’. Refer documentation for details. For Part 3 Feature Extraction
and Part 4 Blob Detection, directly use scipy’s convolution function with the same settings, ensuring
zero-padding.
(c) Plot the following output and put it in your report and then describe what Gaussian filtering does to
the image in one sentence. Load the image grace hopper.png as the input and apply a Gaussian
filter that is 3 × 3 with a standard deviation of σ = 0.572. (2 pts)
(d) Discuss in your report why it is a good idea for a smoothing filter to sum up to 1. (3 pts)
Advice: As an experiment to help deduce why, observe that if you sum all the values with of the
Gaussian filter in (c), you should get a sum close to 1. If you are very particular about this, you
can make it exactly sum to 1 by dividing all filter values by their sum. When this filter is applied to
’grace hopper.png’, what are the output intensities (min, max, range)? Now consider a Gaussian
filter of size 3 × 3 and standard deviation σ = 2 (but do not force it to sum to 1 – just use the values). Calculate the sum of all filter values in this case. What happens to the output image intensities in
this case? If you are trying to plot the resulting images using matplotlib.pyplot to compare the
difference, set vmin = 0 and vmax = 255 to observe the difference.
(e) Consider the image as a function I(x, y) and I : R
2 → R. When working on edge detection, we often
pay a lot of attention to the derivatives. Denote the “derivatives”:
Ix(x, y) = I(x + 1, y) − I(x − 1, y) ≈ 2
∂I
∂x(x, y)
Iy(x, y) = I(x, y + 1) − I(x, y − 1) ≈ 2
∂I
∂y (x, y)
where Ix is the twice the derivative and thus off by a factor of 2. This scaling factor is not a concern
since the units of the image are made up and anyway scale the derivative and so long as you are consistent, things are fine.
Derive in your report the convolution kernels for derivatives: (i) kx ∈ R
1×3
: Ix = I ∗ kx; (ii)
ky ∈ R
3×1
: Iy = I ∗ ky. (3 pts)
(f) Follow the detailed instructions in filters.py and complete the function edge detection()
in filters.py, whose output is the gradient magnitude. (3 pts)
(g) Use the original image and the Gaussian-filtered image as inputs respectively and use edge detection()
to get their gradient magnitudes. Plot both outputs and put them in your report. Discuss in your
report the difference between the two images in no more than three sentences. (3 pts)
4
Task 3: Sobel Operator (9 pts) The Sobel operator is often used in image processing and computer
vision.
(a) The Sobel filters Sx and Sy are given below and are related to a particular Gaussian kernel GS:
Sx =


1 0 −1
2 0 −2
1 0 −1

 Sy =


1 2 1
0 0 0
−1 −2 −1

 GS =


1 2 1
2 4 2
1 2 1

 .
Show in your report the following result: If the input image is I and we use GS as our Gaussian filter,
taking the horizontal-derivative (i.e., ∂
∂x I(x, y)) of the Gaussian-filtered image, can be approximated by
applying the Sobel filter (i.e., computing I ∗ Sx). (5 pts)
Advice: You should use the horizontal filter kx that you derive previously – in particular, the horizontal
derivative of the Gaussian-filtered image is (I ∗ GS) ∗ kx. You can take advantage of properties of
convolution so that you only need to show that two filters are the same. If you do this right, you can
completely ignore the image.
(b) Complete the function sobel operator() in filters.py with the kernels/filters given previously. (2 pts)
(c) Plot the following and put them in your report: I ∗ Sx, I ∗ Sy, and the gradient magnitude. with the
image ’grace hopper.png’ as the input image I. (2 pts)
Task 4: LoG Filter (10 pts) The Laplacian of Gaussian (LoG) operation is important in computer vision.
(a) In filters.py, you are given two LoG filters. You are not required to show that they are LoG, but
you are encouraged to know what an LoG filter looks like. Include in your report, the following: the
outputs of these two LoG filters and the reasons for their difference. Discuss in your report whether
these filters can detect edges. Can they detect anything else? (4 pts)
Advice: By detected regions we mean pixels where the filter has a high absolute response.
(b) Instead of calculating a LoG, we can often approximate it with a simple Difference of Gaussians (DoG).
Specifically many systems in practice compute their “Laplacian of Gaussians” is by computing (I ∗
Gkσ) − (I ∗ Gσ) where Ga denotes a Gaussian filter with a standard deviation of a and k > 1 (but is
usually close to 1). If we want to compute the LoG for many scales, this can be far faster – rather than
apply a large filter to get the LoG, one can get it for free by repeatedly blurring the image with little
kernels.
Discuss in your report computing (I ∗ Gkσ) − (I ∗ Gσ) might successfully roughly approximate
convolution by the Laplacian of Gaussian. You should include a plot or two showing filters in
your report.. To help understand this, we provide a three 1D filters with 501 entries in log1d.npz.
Load these with data = np.load(’log1d.npz’). There is a LoG filter with σ = 50 in variable data[’log50’], and Gaussian filters with σ = 50 and σ = 53 in data[’gauss50’] and
data[’gauss53’] respectively. You should assume these are representative samples of filters and
that things generalize to 2D. (6 pts)
Advice: Try visualizing the following functions: two Gaussian functions with different variances, the
difference between the two functions, the Laplacian of a Gaussian function. To explain why it works,
remember that convolution is linear: A ∗ C + B ∗ C = (A + B) ∗ C.
5
3 Feature Extraction [24 pts]
This question looks long, but that is only because there is a fairly large amount of walk-through and formalizing stuff. The resulting solution, if done properly, is certainly under 10 lines. If you use filtering, please
use scipy.ndimage.convolve() to perform convolution whenever you want to use it. Please use
zero padding for consistency purposes (Set mode=‘constant’).
Foreword: While edges can be useful, corners are often more informative features as they are less common.
In this section, we implement a Harris Corner Detector (see: https://en.wikipedia.org/wiki/Harris Corner
Detector) to detect corners. Corners are defined as locations (x, y) in the image where a small change
any direction results in a large change in intensity if one considers a small window centered on (x, y) (or,
intuitively, one can imagine looking at the image through a tiny hole that’s centered at (x, y)). This can
be contrasted with edges where a large intensity change occurs in only one direction, or flat regions where
moving in any direction will result in small or no intensity changes. Hence, the Harris Corner Detector
considers small windows (or patches) where a small change in location leads large variation in multiple
directions (hence corner detector).
Let’s consider a grayscale image where I(x, y) is the intensity value at image location (x, y). We can
calculate the corner score for every pixel (i, j) in the image by comparing a window W centered on (i, j)
with that same window centered at (i + u, j + v). To be specific: a window of size 2d + 1 centered on (i, j)
is a the set of pixels between i − d to i + d and j − d to j + d. Specifically, we will compute the sum of
square differences between the two,
E(u, v) = X
x,y∈W
[I(x + u, y + v) − I(x, y)]2
or, for every pixel (x, y) in the window W centered at i, j, how different is it from the same window, shifted
over (u, v). This formalizes the intuitions above:
• If moving (u, v) leads to no change for all (u, v), then (x, y) is probably flat.
• If moving (u, v) in one direction leads to a big change and adding (u, v) in another direction leads to
a small change in the appearance of the window, then (x, y) is probably on an edge.
• If moving any (u, v) leads to a big change in appearance of the window, then (x, y) is a corner.
You can compute this E(u, v) for all (u, v) and at all (i, j).
Task 5: Corner Score (9 pts) Your first task is to write a function that calculates this function for all pixels
(i, j) with a fixed offset (u, v) and window size W. In other words, if we calculate S = cornerscore(u, v),
S is an image such that Sij is the sum-of-squared differences between the window centered on (i, j) in
I and the window centered on (i + u, j + v) in I. The function will need to calculate this function to
every location in the image. This is doable via a quadruple for-loop (for every pixel (i, j), for every pixel
(x, y) in the window centered at (i, j), compare the two). However, you can also imagine doing this by (a)
offsetting the image by (u, v); (b) taking the squared difference with the original image; (c) summing up
the values within a window using convolution. Note: If you do this by convolution, use zero padding for
offset-window values that lie outside of the image.
(a) Complete the function corner score() in corners.py which takes as input an image, offset
values (u, v), and window size W. The function computes the response E(u, v) for every pixel. We can
6
look at, for instance the image of E(0, y) to see how moving down y pixels would change things and
the image of E(x, 0) to see how moving right x pixels would change things. (3 pts)
Advice: You can use np.roll for offsetting by u and v.
(b) Plot and put in your report your output for for grace hopper.png for(u, v) = {(0, 5),(0, −5),(5, 0),(−5, 0)}
and window size (5, 5) (3 pts)
(c) Early work by Moravec [1980] used this function to find corners by computing E(u, v) for a range of
offsets and then selecting the pixels where the corner score is high for all offsets.
Discuss in your report why checking all the us and vs might be impractical in a few sentences. (3 pts)
Foreword: For every single pixel (i, j), you now have a way of computing how much changing by (u, v)
changes the appearance of a window (i.e., E(u, v) at (i, j)). But in the end, we really want a single number
of “cornerness” per pixel and don’t want to handle checking all the (u, v) values at every single pixel (i, j).
You’ll implement the cornerness score invented by Harris and Stephens [1988].
Harris and Stephens recognized that if you do a Taylor series of the image, you can build an approximation
of E(u, v) at a pixel (i, j). Specifically, if Ix and Iy denote the image of the partial derivatives of I with
respect to x and y (computable via kx and ky from above), then
E(u, v) ≈
X
W

I
2
xu
2 + 2IxIyuv + I
2
y
v
2

= [u, v]
P
W I
2
x
P
P
W IxIy
W IxIy
P
W I
2
y

[u, v]
T = [u, v]M[u, v]
T
.
This matrix M has all the information needed to approximate how rapidly the image content changes within
a window near each pixel and you can compute M at every single pixel (i, j) in the image. To avoid extreme
notation clutter, we assume we are always talking about some fixed pixel i, j, the sums are over x, y in a
2d + 1 window W centered at i, j and any image (e.g., Ix) is assumed to be indexed by x, y. But in the
interest of making this explicit, we want to compute the matrix M at i, j. The top-left and bottom-right
elements of the matrix M for pixel i, j are:
M[0, 0] = X
i−d≤x≤i+d
j−d≤y≤j+d
Ix(x, y)
2 M[1, 1] = X
i−d≤x≤i+d
j−d≤y≤j+d
Iy(x, y)
2
.
If you look carefully, you may be able to see that you can do this by convolution – with a filter that sums
things up.
What does this do for our lives? We can decompose the M we compute at each pixel into a rotation matrix
R and diagonal matrix diag([λ1, λ2) such that (specifically an eigen-decomposition):
M = R−1diag([λ1, λ2])R
where the columns of R tell us the directions that E(u, v) most and least rapidly changes, and λ1, λ2 tell us
the maximum and minimum amount it changes. In other words, if both λ1 and λ2 are big, then we have a
corner; if only one is big, then we have an edge; if neither are big, then we are on a flat part of the image.
Unfortunately, finding eigenvectors can be slow, and Harris and Stephens were doing this over 30 years ago.
Harris and Stephens had two other tricks up their sleeve. First, rather than calculate the eigenvalues directly, for a 2x2 matrix, one can compute the following score, which is a reasonable measure of what the
eigenvalues are like:
R = λ1λ2 − α(λ1 + λ2)
2 = det(M) − αtrace(M)
2