EECS 445 Discussion 7: Adaboost

Starting from:

$27

EECS 445 Discussion 7: Adaboost

1 Adaboost
Recall that Boosting is a technique we use to reduce estimation error while increasing the complexity of our model to
reduce both estimation and structural errors. All boosting algorithms are combinations of weak learners to together
build a strong learner. Adaboost is one of the most popular boosting algorithms, and it uses decision stumps as its
weak learners. Last week, we defined the decision stump. Here, we will go more in depth into the algorithm, derive
the update rules and step through an example.
The Adaboost algorithm is as follows:
1 Initialization: Set Wf0 =
1
n
for i = 1 . . . n
2 for m = 1 to M do:
3 find ¯θ
∗
m that approximately minimizes m =
Pn
i=1 Wfm−1(i)[[y
(i) 6= h(¯x
(i)
;
¯θm)]](*)
4 // compute weighted error:
5 ˆm =
Pn
i=1 Wfm−1(i)[[y
(i) 6= h(¯x
(i)
;
¯θ
∗
m)]]
6 // compute weight of current decision stump:
7 αˆm =
1
2
loge(
1−ˆm
ˆm
)
8 // update weights on all training examples:
9 for i = 1 . . . n do:
10 Wfm(i) = cmWfm−1(i)exp(−y
(i)αˆmh(¯x
(i)
;
¯θ
∗
m))
11 end for
12 end for
13 Final classifier:
14 h(¯x) = PM
m=1 αˆmh(¯x;
¯θ
∗
m)
(*) Notice, we are not just trying to minimize training error here; however, we are actually trying to find a ¯θ
∗
m that
approximately minimizes weighted training error. The reason is that we want our current weak classifier to have more
focus on the points that have been mis-classified by the previous ensemble classifier. Therefore, these mistakes can
potentially be corrected in the updated classifier.
The main goal of the Adaboost algorithm is to find M number of ¯θ
∗
’s and αˆ’s, or in other words M decision stumps
and weights for each stump. Then, our final classifier is a linear combination of the decision stumps each scaled by
its weight. Intuitively, to classify a new point, we let each decision stump ¯θ
∗
m vote for what they think the label of
the new point is, and weight each vote by αˆm. We sum every vote up, and then take the sign of the result as our final
classification decision.
Now let us see the derivation of the update rules:
1
Page 2
Page 3
Next, let’s walk through an example by manually going through an iteration of the Adaboost algorithm.
Page 4
Figure 1: Blue circle = positive and red diamond = negative
Discussion Question 1. Consider the data points in Figure 1. Blue circles are positive examples and red diamonds
are negative examples (n = 7). What are Wf0, ˆ1, αˆ1, and Wf1 after the first iteration of the Adaboost algorithm on
these data points? The first decision stump is given to you, as shown in the figure. Note that this decision stump
minimizes the weighted error, since we initialize all weights to be equal.
Discussion Question 2. What is the weighted error relative to the updated weights after the first iteration? Specifically, compute ˆ1, but use Wf1 instead of Wf0.
Note that M, the number of decision stumps, is a parameter that we set beforehand (i.e. a hyper-parameter). This
parameter can be determined with the same procedures we used for determining λ or C. There is no one correct value
for M.
Figure 2: Blue circle = positive and red diamond = negative
Discussion Question 3. Let’s say that we’ve gone through another iteration of the Adaboost algorithm, and our new
decision stump (green line) is as shown in Figure 2 (M = 2). Assume that αˆ1 = 0.9 and αˆ2 = 0.6. What is our final
classifier, h(¯x)? Consider the pink circle in the figure. What is our classification result for this new point?
Page 5