## Cs 229 Homework Solutions

CS229 Problem Set #2 Solutions1CS 229, Spring 2016Problem Set #2: Naive Bayes, SVMs, and TheoryDue Wednesday, May 4 at 11:00 pm on Gradescope.Notes:(1) These questions require thought, but do not require long answers. Please be asconcise as possible. (2) If you have a question about this homework, we encourage you to postyour question on our Piazza forum, athttps://piazza.com/stanford/spring2016/cs229. (3)If you missed the Frst lecture or are unfamiliar with the collaboration or honor code policy, pleaseread the policy on Handout #1 (available from the course website) before starting work. (4) ±orproblems that require programming, please include in your submission a printout of your code(with comments) and any Fgures that you are asked to plot.If you are scanning your document by cellphone, please check the Piazza forum for recommendedcellphone scanning apps and best practices.1.[15 points] Constructing kernelsIn class, we saw that by choosing a kernelK(x, z) =φ(x)Tφ(z), we can implicitly mapdata to a high dimensional space, and have the SVM algorithm work in that space. Oneway to generate kernels is to explicitly deFne the mappingφto a higher dimensional space,and then work out the correspondingK.However in this question we are interested in direct construction of kernels. I.e., supposewe have a functionK(x, z) that we think gives an appropriate similarity measure for ourlearning problem, and we are considering pluggingKinto the SVM as the kernel function.However forK(x, z) to be a valid kernel, it must correspond to an inner product in somehigher dimensional space resulting from some feature mappingφ. Mercer’s theorem tellsus thatK(x, z) is a (Mercer) kernel if and only if for any Fnite set{x(1), . . . , x(m)}, thematrixKis symmetric and positive semideFnite, where the square matrixK∈Rm×misgiven byKij=K(x(i), x(j)).Now here comes the question: LetK1,K2be kernels overRn×Rn, leta∈R+be a positivereal number, letf:Rnm→Rbe a real-valued function, letφ:Rn→Rdbe a functionmapping fromRntoRd, letK3be a kernel overRd×Rd, and letp(x) a polynomial overxwithpositivecoe²cients.±or each of the functionsKbelow, state whether it is necessarily a kernel. If you think itis, prove it; if you think it isn’t, give a counter-example.(a) [1 points]K(x, z) =K1(x, z) +K2(x, z)(b) [1 points]K(x, z) =K1(x, z)−K2(x, z)(c) [1 points]K(x, z) =aK1(x, z)(d) [1 points]K(x, z) =−aK1(x, z)(e) [5 points]K(x, z) =K1(x, z)K2(x, z)(f) [2 points]K(x, z) =f(x)f(z)(g) [2 points]K(x, z) =K3(φ(x), φ(z))(h) [2 points]K(x, z) =p(K1(x, z))

CS229 Problem Set #1 Solutions 2 The − λ 2 θ T θ here is what is known as a regularization parameter, which will be discussed in a future lecture, but which we include here because it is needed for Newton’s method to perform well on this task. For the entirety of this problem you can use the value λ = 0 . 0001. Using this de±nition, the gradient of ℓ ( θ ) is given by ∇ θ ℓ ( θ ) = X T z − λθ where z ∈ R m is de±ned by z i = w ( i ) ( y ( i ) − h θ ( x ( i ) )) and the Hessian is given by H = X T DX − λI where D ∈ R m × m is a diagonal matrix with D ii = − w ( i ) h θ ( x ( i ) )(1 − h θ ( x ( i ) )) For the sake of this problem you can just use the above formulas, but you should try to derive these results for yourself as well. Given a query point x , we choose compute the weights w ( i ) = exp p − || x − x ( i ) || 2 2 τ 2 P . Much like the locally weighted linear regression that was discussed in class, this weighting scheme gives more when the “nearby” points when predicting the class of a new example. (a) Implement the Newton-Raphson algorithm for optimizing ℓ ( θ ) for a new query point x , and use this to predict the class of x . The q2/ directory contains data and code for this problem. You should implement the y = lwlr(X train, y train, x, tau) function in the lwlr.m ±le. This func-tion takes as input the training set (the X train and y train matrices, in the form described in the class notes), a new query point x and the weight bandwitdh tau .

## Leave a Comment

(0 Comments)