Starting from:

$35

Machine Learning and Data Analytics Homework #1

Engineering Applications of Machine Learning and Data Analytics Homework #1 Probability and Discriminant Classifiers [20pts] PART I: Maximum Posterior vs Probability of Chance Show/explain that P(ωmax|x) ≥ 1 c when we are using the Bayes decision rule, where c is the number of classes. Derive an expression for p(err). Let ωmax be the state of nature for which P(ωmax|x) ≥ P(ωi |x) for i =, 1 . . . , c. Show that p(err) ≤ (c − 1)/c when we use the Bayes rule to make a decision. Hint, use the results from the previous questions. PART II: Bayes Decision Rule Classifier Let the elements of a vector x = [x1, . . . , xd] T be binary valued. Let P(ωj ) be the prior probability of the class ωj (j ∈ [c]), and let pij = P(xi = 1|ωj ) with all elements in x being independent. If P(ω1) = P(ω2) = 1 2 , and pi1 = p > 1 2 and pi2 = 1 − p, show that the minimum error decision rule is Choose ω1 if X d i=1 xi > d 2 Hint: Think back to ECE503 and types of random variables then start out with Choose ω1 if P(ω1)P(x|ω1) > P(ω2)P(x|ω2) PART III: The Ditzler Household Growing Up My parents have two kids now grown into adults. Obviously there is me, Greg. I was born on a Wednesday. What is the probability that I have a brother? You can assume that P(boy) = P(girl) = 1 2 . PART IV: Bayes classifier Let consider a Bayes classifier with p(x|ωi) distributed as a multivariate Gaussian with mean µi and covariance Σi = σ 2 I (note they all share the same covariance). We choose the class that has the largest gi(x) = log(p(x|ωi)P(ωi)) ∝ wT i x + w0i Find wi and w0i . Fact: p(x|ωi) = 1 (2π) d 2 |Σi | 1 2 exp  − 1 2 (x − µi) T Σ −1 i (x − µi)  arizona.edu 2 Spring 2022 University of Arizona 2 Linear and Quadratic Classifiers – Code [20pts] • Write a general function to generate random samples from N (µ, Σ) in d-dimensions (i.e., µ ∈ R d and Σ ∈ R d×d ). • Write a procedure of the discriminant of the following form gi(x) = − 1 2 (x − µi) T Σ −1 i (x − µi) − d 2 log(2π) − 1 2 log(|Σi |) + log(P(ωi)) (1) • Generate a 2D dataset with three classes and use the quadratic classifier above to learn the parameters and make predictions. As an example, you should generate training data shown below to estimate the the parameters of the classifier in (1) and you should test the classifier on another randomly generated dataset. It is also sufficient to show the dataset used to train your classifier and the decision boundary it produces. −4 −2 0 2 4 −4 −2 0 2 4 x1 x2 ω1 ω2 ω3 • Write a procedure for computing the Mahalanobis distance between a point x and some mean vector µ, given a covariance matrix Σ. • Implement the naïve Bayes classifier from scratch and then compare your results to that of Python’s built-in implementation. Use different means, covariance matrices, prior probabilities (indicated by relative data size for each class) to demonstrate that your implementations are correct. arizona.edu 3 Spring 2022 University of Arizona 3 Misc Code [20pts] - Choose One Problem I: Comparing Classifiers A text file, hw1-scores.txt, containing classifier errors measurements has been uploaded to D2L. Each of the columns represents a classifier and each row a data set that was evaluated. Are all of the classifiers performing equally? Is there one or more classifiers that is performing better than the others? Your response should be backed by statistics. Suggested reading: • Janez Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of Machine Learning Research, vol. 7, 1–20. Read the abstract to get an idea about the theme of comparisons. Sections 3.1.3 and 3.2.2 can be used to answer the question posed here. Problem II: Sampling from a Distribution Let the set N ∈ [1, . . . , n, . . . , N] be a set of integers and p be a probability distribution p = [p1, . . . , pn, . . . , pN ] such that pk is the probability of observing k ∈ N . Note that since p is a distribution then 1 Tp = 1 and 0 ≤ pk ≤ 1 ∀n. Write a function sample(M, p) that returns M indices sampled from the distribution p. Provide evidence that your function is working as desired. Note that all sampling is assumed to be i.i.d. You must include a couple of paragraphs and documented code that discusses how you were able to accomplish this task. arizona.edu 4