CPSC 340 Assignment 4

Starting from:

$30

CPSC 340 Assignment 4
Instructions
Rubric: {mechanics:3}
The above points are allocated for following the general homework instructions. In addition to the usual
instructions, we have a NEW REQUIREMENT for this assignment (and future assignments, unless
it’s a disaster): if you’re embedding your answers in a document that also contains the questions, your
answers should be in blue text. This should hopefully make it much easier for the grader to find your
answers. To make something blue, you can use the LaTeX macro \blu{my text}.
1 Convex Functions
Rubric: {reasoning:5}
Recall that convex loss functions are typically easier to minimize than non-convex functions, so it’s important
to be able to identify whether a function is convex.
Show that the following functions are convex:
1. f(w) = αw2 − βw + γ with w ∈ R, α ≥ 0, β ∈ R, γ ∈ R (1D quadratic).
2. f(w) = w log(w) with w > 0 (“neg-entropy”)
3. f(w) = kXw − yk
2 + λkwk1 with w ∈ R
d
, λ ≥ 0 (L1-regularized least squares).
4. f(w) = Pn
i=1 log(1 + exp(−yiw
T xi)) with w ∈ R
d
(logistic regression).
5. f(w, w0) = PN
i=1[max{0, w0 − w
T xi} − w0] + λ
2
kwk
2
2 with w ∈ Rd
, w0 ∈ R, λ ≥ 0 (“1-class” SVM).
General hint: for the first two you can check that the second derivative is non-negative since they are onedimensional. For the last 3 you’ll have to use some of the results regarding how combining convex functions
can yield convex functions; these “notes on convexity” are posted on the course homepage as readings for
Lecture 10.
Hint for part 4 (logistic regression): this function may seem non-convex since it contains log(z) and log is
concave, but there is a flaw in that reasoning: for example log(exp(z)) = z is convex despite containing a
log. To show convexity, you can reduce the problem to showing that log(1 + exp(z)) is convex, which can
be done by computing the second derivative. It may simplify matters to note that exp(z)
1+exp(z) =
1
1+exp(−z)
.
2 Logistic Regression with Sparse Regularization
If you run python main.py -q 2, it will:
1. Load a binary classification dataset containing a training and a validation set.
2. ‘Standardize’ the columns of X and add a bias variable (in utils.load dataset).
3. Apply the same transformation to Xvalidate (in utils.load dataset).
1
4. Fit a logistic regression model.
5. Report the number of features selected by the model (number of non-zero regression weights).
6. Report the error on the validation set.
Logistic regression does ok on this dataset, but it uses all the features (even though only the prime-numbered
features are relevant) and the validation error is above the minimum achievable for this model (which is 1
percent, if you have enough data and know which features are relevant). In this question, you will modify
this demo to use different forms of regularization to improve on these aspects.
Note: your results may vary a bit depending on versions of Python and its libraries.
2.1 L2-Regularization
Rubric: {code:2}
Make a new class, logRegL2, that takes an input parameter λ and fits a logistic regression model with
L2-regularization. Specifically, while logReg computes w by minimizing
f(w) = Xn
i=1
log(1 + exp(−yiw
T xi)),
your new function logRegL2 should compute w by minimizing
f(w) = Xn
i=1

log(1 + exp(−yiw
T xi))
+
λ
2
kwk
2
.
Hand in your updated code. Using this new code with λ = 1, report how the following quantities change:
the training error, the validation error, the number of features used, and the number of gradient descent
iterations.
Note: as you may have noticed, lambda is a special keyword in Python and therefore we can’t use it as a
variable name. As an alternative I humbly suggest lammy, which is what my neice calls her stuffed animal
toy lamb. However, you are free to deviate from this suggestion. In fact, as of Python 3 one can now use
actual greek letters as variable names, like the λ symbol. But, depending on your text editor, it may be
annoying to input this symbol.
2.2 L1-Regularization
Rubric: {code:3}
Make a new class, logRegL1, that takes an input parameter λ and fits a logistic regression model with
L1-regularization,
f(w) = Xn
i=1

log(1 + exp(−yiw
T xi))
+ λkwk1.
Hand in your updated code. Using this new code with λ = 1, report how the following quantities change:
the training error, the validation error, the number of features used, and the number of gradient descent
iterations.
You should use the function minimizers.findMinL1, which implements a proximal-gradient method to minimize the sum of a differentiable function g and λkwk1,
f(w) = g(w) + λkwk1.
2