8The Hessian
8.1 Introduction
8.2 Several variables
Explain how the chain rule is applied to get
(8.3)
.
Suppose that is given by
Then the gradient
and the Hessian
of are computed in the Sage window below.
See the further documentation for Calculus functions in Sage.
Verify (just this once) by hand the computations done by Sage in Example
8.3
.
Also, experiment with a few other functions in the Sage window and compute their
Hessians.
Give a detailed explanation as to why
(8.5)
holds.
8.3 Newton's method for finding critical points
We will return to the logistic regression in Example
7.34
about the
Challenger disaster. Here we sought to maximize the function
In order to employ Newton's method we compute the gradient and the Hessian of
(8.8)
where
is the sigmoid function.
Notice the potential problem in using Newton's method here: the formula for
the second order derivatives in
(8.9)
show that if the are just mildly big, say , then
the Hessian is extremely close to the zero matrix and therefore Sage considers it non-invertible and
(8.7)
fails.
In the code below we have nudged the initial vector so that it works, but you
can easily set other values and see its failure. Optimization is not just mathematics, it also calls
for some good (engineering) implementation skills (see for example details on the
quasi Newton algorithms).
In the instance below we do, however, get a gradient that is practically .
Code for Newton's method
8.3.1 Transforming data for better numerical performance
Why is the claim/trick alluded to in
(8.11)
true?
Below is a snippet of Sage code implementing the trick in
(8.11)
.
The function test takes as input x0 (an initial vector like [0,0]) and
noofits (the number of iterations of Newton's method). You execute this in the Sage
window by adding for example
Experiment and compare with the official output from Example
7.34
. Also, compute
the gradient of the output below for the original non-transformed problem.
Transformed code
test([0,0], 10)
and then pressing Compute.8.4 The Hessian and critical points
A point is called a saddle point for if there exists two vectors , such that
as illustrated in the graphics below.
A symmetric matrix is called indefinite if there exists
with
Let be a symmetric matrix and an
invertible matrix. Then is
indefinite (positive definite, negative definite) if and only if
is indefinite (positive definite, negative definite).
Let be a critical point for . Then
- is a local minimum if is positive definite.
- is a local maximum if is negative definite.
- is a saddle point if is indefinite.
Consider, with our new technology in Theorem
8.12
, Exercise
7.22
once again.
Here we analyzed the point for the function
and showed (by a trick) that is neither a local maximum nor a local minimum for . The Hessian
matrix for at is
Now
Theorem
8.12
(ⅲ)
shows that is a saddle point, since
and
Try plotting the graph for
different values of a
a=4 shows the saddle point clearly.
in the Sage window in
Example
8.13
. What do you observe for the point with
respect to the function? Does a have to be a number? Could it be a symbolic
expression in the variables x and y like a = -10*cos(x)*sin(y)?
Check the computation of the Hessian matrix in Example
8.13
by showing
that the Hessian matrix for at the point is
Give an example of a function having a local minimum at
, where is not positive definite.
Consider the function
Compute its critical points and decide on their types according to Theorem
8.12
.
Try to convince yourself that
for every .
Look at the minimization problem
subject to
where is a big number.
Give an example of a function that has a local maximum, but where
there exists with for any given (large) number .
8.5 Differential convex functions of several variables
Let be a differentiable function, where
is an open convex subset. Then is convex if
and only if
for every .
Suppose that
(8.14)
holds and let with
, where . To prove that is convex
we must verify the inequality
Let . Then
by
(8.14)
. If you multiply the first inequality by , the
second by and then add the two, you get
(8.15)
.
Suppose on the other hand that is a convex function. Let . Since is an open subset, it follows that for , where is
sufficiently small. Now define the function by
Being the composition of two differentiable functions, is
differentiable. Suppose that and . Then
showing that is a convex function. By Theorem
6.58
,
which translates into
by using the chain rule in computing .
Prove that a bounded convex differentiable function is
constant.
Let be a differentiable function with continuous
second order partial derivatives, where is a
convex open subset. Then is convex if and only if the Hessian
is positive semidefinite for every . If
is positive definite for every , then is
strictly convex.
We have done all the work for a convenient reduction to the one
variable case. Suppose that is convex. Then the same reasoning
as in the proof of Theorem
8.21
shows that
is a convex function for every and every from
an open interval to for suitable
. Therefore by
Theorem
6.51
. This proves that the matrix is
positive semidefinite for every . Suppose on the other hand
that is positive semidefinite for every .
Then Theorem
6.51
shows that is a convex
function from to for small
and , since
for . Therefore is a convex function, since
The same argument (using the last part of Theorem
6.51
on
strict convexity), shows that is strictly convex if
is positive definite. It follows that is strictly
convex if is positive definite for every .
Prove that
is a strictly convex function from to . Also, prove that
is a convex subset of .
Is strictly convex on some non-empty
open convex subset of the plane?
Show that given by
is a convex function. Is strictly convex?
Let be given by
where .
- Show that is a strictly convex function if and only if
and .This is a hint for the only if part. If is the Hessian for , then where - this is seen by a matrix multiplication computation. We know that is positive semidefinite. If was not positive definite, there would exist with . Now use to complete the proof that is positive definite by looking at .
- Suppose now that and . Show that has a unique global minimum and give a formula for this minimum in terms of and .
8.6 How to decide the definiteness of a matrix
Show that a diagonal matrix
is positive definite if and only if , positive semidefinite
if and only if and
indefinite if and only if there exists with and .
Let be a real symmetric matrix. Then there exists an
invertible matrix , such that is a diagonal matrix.
Suppose that . If has a non-zero entry in the upper
left hand corner i.e., , then
where is a real symmetric matrix and is the
invertible matrix
By induction on we may find an invertible matrix matrix such that
Putting
it follows that
We now treat the case of a zero entry in the upper left hand corner
i.e., . Suppose first that for
some . Let denote the identity matrix with the first and
-th rows interchanged. The operation amounts to
interchanging the first and -th columns in . Similarly
is interchanging that first and -th rows in .
The matrix is invertible and is a symmetric matrix
with and we have reduced to the case
of a non-zero entry in the upper left hand corner.
If for every we may assume that for some . Let denote the identity matrix where
the entry in the first column and -th row is . The operation
amounts to adding the -th column to the first
column in . Similarly is adding the -th row
to the first row in . All in all we get , where we have used that for . Again we have reduced to the case of a non-zero entry in the
upper left hand corner.
Consider the real symmetric matrix.
Here . Therefore the fundamental step in the
proof of Theorem
8.29
applies and
and again
Summing up we get
You are invited to check that
Let
Here , but the diagonal element . So we are in the second step of the proof of
Theorem
8.29
. Using the matrix
we get
As argued in the proof, this corresponds to interchanging the first
and third columns and then interchanging the first and third
rows. In total you move the non-zero to the upper left
corner in the matrix.
Consider the symmetric matrix
We have zero entries in the diagonal. As in the third step in the
proof of Theorem
8.29
we must find an invertible matrix
, such that the upper left corner in is
non-zero. In the proof it is used that every diagonal element is
zero: if we locate a non-zero element in the -th column in the
first row, we can add the -th column to the first column and then
the -th row to the first row obtaining a non-zero element in the
upper left corner. For above we choose and the matrix
becomes
so that
Let be any matrix. Show that
is positive semidefinite.
Find inequalities defining the set
Same question with positive semidefinite. Sketch and compare the two
subsets of the plane .
Let denote the function given by
where . Let denote the Hessian of
in a point .
- Compute .
- Show that for and .
- Compute a non-zero vector , such that in the case, where . Is invertible in this case?
- Show that is strictly convex if .
-
Is strictly convex if ?Consider the line segment between and a suitable vector , where .
Why is the subset given by the inequalities
a convex subset of ?
8.7 A schematic procedure for transforming symmetric matrices
Here is the schematic procedure applied to Example
8.30
:
Here is the schematic procedure applied to Example
8.31
:
Here is the schematic procedure applied to Example
8.32
: