3Matrices

Handling linear equations and keeping track of the unknowns can be a pain. At a certain point one needs to simplify the notation. This is done introducing matrices.
For example, the system of equations
can be represented by the rectangular array (matrix)
of numbers. Many of the operations we do to solve linear equations might as well be done on this array forgetting about the unknowns.

3.1 Matrices

3.1.1 Definitions

A rectangular array of numbers is called a matrix. A matrix with rows and columns is called an ( by ) matrix. The notation for an matrix is
where denotes the number or entry in the -th row and -th column. If the matrix in (3.2) is denoted , then it has rows and columns with .
Two matrices are equal if they have the same number of rows and columns and their entries are identical.
A very useful (and famous) open source library in python (with 1000+ contributors) for handling matrices is NumPy. Here is how the matrix in (3.2) is entered in NumPy.
  1. A matrix whose entries are all is called a zero matrix. It is denoted simply by , when it is clear from the context what its numbers of rows and columns are.
  2. A matrix is called quadratic if it has an equal number of rows and columns. The first two matrices below are quadratic, whereas the third is not.
  3. The diagonal in a matrix is defined as the entries in the matrix with the same row- and column indices. Below we have a matrix with the diagonal elements marked
    A matrix is called a diagonal matrix, if all its entries outside the diagonal are . Below is an example of a square diagonal matrix
  4. A matrix is called a row vector if it has only one row. For example,
    is a row vector with three columns.
  5. A matrix is called a column vector if it has only one column. For example,
    is a column vector with three rows.
  6. The rows in a matrix are called the row vectors of the matrix. The -th row in a matrix is denoted . The matrix in (3.2) contains the row vectors
  7. The columns in a matrix are called the column vectors of the matrix. The -th column in a matrix is denoted Not to be confused with powers of the matrix introduced later. . The matrix in (3.2) contains the column vectors
  8. A row- or column vector is referred to as a vector.
  9. Even though we have used the notation for the -th cartesian product of , we will use henceforth to denote the set of column vectors with rows (entries). This definition is almost identical with the previous one, except that the tuple is formatted as a column vector. Illustrated by an example,

3.2 Linear maps

In the first chapter we encountered a miniature version of a neural network. Neural networks are generally incredibly complicated functions from to . The function given by
even though it looks complicated, is simple in comparison.
You probably agree that the function given by
is even simpler. This function (or map) is an example of a linear map. In general, a linear map has the form
where are real numbers.
Using matrices we will use the notation
In this way, we can write the map as
where is the matrix
and is the vector
in .
Basically a linear map is a system of linear equations without the right hand side (including ). In fact, we may write the system of linear equations in (3.1) as
Let be the linear map given by the matrix
Does there exist , such that
Quite generally, can we find , such that
for arbitrary ?
Suppose you know that is a linear map and that you have a black box giving you output if you supply the input . How would you find the matrix defining ?

3.3 Matrix multiplication

Suppose we are given two linear maps and . Then it turns out that the composition is also a linear map. A word of advice: the computations below look large and intimidating. They are not. It is important that you carry them out on your own. Do not look and copy or tell yourself that it looks okay. Do the computations yourself and ask me or fellow students if you get stuck.
Let us look at an example. Suppose that
Then
In terms of the matrices of the linear maps, we write this as
There is nothing special about the numbers in this example. We might as well do the computation in general: suppose that
Then
Again, in terms of the matrices of the linear maps, we write this as
The equation above is the formula for matrix multiplication for two matrices, precisely as it was introduced by Cayley around .
Upon closer inspection (and colored in (3.5) for and ), you will see that the number in the -th row and -th column in the product matrix is the row-column multiplication between the -th row and the -th column in the two matrices:
The row-column multiplication between a row vector
with the same number of entries is defined as
Let be an matrix and an matrix. Then the matrix product is defined as the matrix given by the row-column multiplication
for and .
If is an matrix and is an , then the matrix product only makes sense if : the number of columns in must equal the number of rows in .
Suppose that
Which of the matrix products below make sense?
I have been told that my pronunciation of column in the video below is wrong. In the area of the US, where I got my PhD, people for some reason had this (Irish?) rare pronunciation.
Using matrix product notation, the system of linear equations in (3.1) can now be written as
Here we multiply a with a matrix. The row-column multiplication gives the matrix
This matrix must equal the matrix on the right hand side for (3.1) to be true. This is in agreement with our convention for writing linear maps in section 3.2 .
Suppose that
Which ones of the statements below are true?
If , then .
If , then .

3.3.1 Matrix multiplication in numpy

Matrix multiplication in numpy is represented by the function dot:

3.3.2 The identity matrix

The identity matrix of order is the diagonal matrix with in the diagonal. Below is the identity matrix of order .
The identity matrix has the crucial property that
for all matrices .
Prove that the two identities in (3.6) are true for matrices.

3.3.3 Examples of matrix multiplication

Matrix multiplication is omnipresent in mathematics. Below we give an example, which is a baby version of Google's famous page rank algorithm.
Suppose that % of the people living in the suburbs move to the big city and that % of the people living in the big city move to the suburbs per year.
Aiming for a model using probabilities, let us be a bit more precise.
  1. If you live in the suburbs, the probability that you move to the big city is ,
  2. If you live in the suburbs, the probability that you do not move is .
  3. If you live in the big city the probability that you move to the suburbs is .
  4. If you live in the big city the probability that you do not move is .
All of the above probabilities are per year and can be illustrated in the diagram below
We are interested in predicting, using this model, how many people live in the big city and the suburbs given that we know how many people live in the big city, and in the suburbs to begin with i.e., setting the time (years).
How many people and live in the two places after the first year ()?
The population of the big city will decrease by , but there are newcomers amounting to of the population in the suburbs. Therefore
In the same way,
Using matrix multiplication, these two equations can be written
For years, we can repeat the procedure and the result becomes
where
In general we have the formula
giving the distribution of the populations for years. Let us experiment a little:
It seems that the distribution stabilizes around living in the big city and living in the suburbs of the original total population.
The matrix is an example of a stochastic matrix. In general, a square matrix is called a stochastic matrix if its entries are and the sum of the entries in its column vectors are .
A simple example of the page rank algorithm is given in Example 2.15 . There you encountered the equations
In terms of matrix multiplication the first four equations can be rewritten to
Putting
we get a stochastic matrix and may again iterate and compute .
Is there a connection between the entries of , where is very big and the solutions to the linear equations?
In the end of Example 3.8 (above) a stochastic matrix is defined. Show that the matrix product of two stochastic matrices is a stochastic matrix.
Below is an example, where matrix multiplication occurs in networks.
Suppose we have five cities connected with roads as shown below
This network has a so called incidence matrix, where city is associated with the -th row and -th column. A in the matrix in the entry means that there is a road from city to city , whereas a means that city and city are not connected by a road:
Here
What is the interpretation of and in general? It turns out that the entry in the matrix exactly is the number of paths of length from city to city .
For example, there are paths from city to city of length corresponding to the paths . The paths from city to city of length are and the paths of length from city to city are .
A deeper explanation
Suppose that we have a network with cities and incidence matrix .
The general proof of the observations above in our special example, builds on the fact that a path of length from city to city has to end with a road from a neighboring city to . For every one of these neighboring cities, we may count the number of paths of length from city . If is the number of paths of length from city to city , then matrix multiplication tells us that
This number is exactly the number of paths of length from city to city , since only when is a neighboring city to city (and otherwise).

3.4 Matrix arithmetic

Matrix multiplication is very different from ordinary multiplication of numbers: it is not commutative. Consider the matrices
Then
i.e.., .
Addition of matrices is like ordinary addition, except that you add all the entries of the involved matrices.

3.4.1 Matrix addition

Addition of two matrices with the same number of rows and columns is defined below.
The zero matrix is the () matrix containing zero in all its entries. When its number of rows and columns are clear from the context it is simply denoted by . For matrices for example, we write
Given an example of a non-zero matrix, such that

3.4.2 Multiplication of a number and a matrix

A matrix may be multiplied by a number by multiplying each entry by the number:
Does there exists a number , such that
Let be a matrix, such that
for every other matrix . Show that is a diagonal matrix of the form
where i.e., .

3.4.3 The distributive law

Ordinary numbers satisfy . This rule also holds for matrices and is called the distributive law (multiplication is distributed over plus)
Let and be matrices, an matrix and an matrix. Then
Let us start by looking at . Here it suffices to do the proof, when is a row vector and column vectors, since
For , we may reduce to the case, where are row vectors and a column vector, since
Both of these cases follow using the distributive law for ordinary numbers.
Suppose that and are two matrices. Is it true that
What about

3.4.4 The miraculous associative law

It does not make sense to multiply three matrices and . We have only defined matrix multiplication for two matrices. There are two natural ways of evaluating :
We can begin by multiplying by and then multiply from the right. However, we may just as well start by multiplying by and then multiply from the left.
It is in no way clear, that these two computations give the same result!
That this turns out to be true, is just one of many miracles in the universe (there is a rather cool mathematical explanation, though, addressed in an exercise below).
Let be an matrix, an matrix and an matrix. Then
We must prove that
for og . The left hand side can be written
The right hand side is
Writing the row-column multiplications in (3.10) , we get
Writing the row-column multiplications in (3.11) , we get
The rows in the sum in (3.12) correspond to the columns in the sum (3.13) . Therefore these sums are equal and .
The associative law is true, but in computing there can be a (big) difference in the number of multiplications in the two computations and i.e., efficiency is not associative for matrix multiplication. In the notation of Theorem 3.17 , computing requires
multiplications, whereas computing requires
multiplications. If for example and , then computing requires multiplications, whereas computing requires multiplications!
Verify the associative law for the three matrices
by showing by explicit computation that
There is in fact a high tech explanation that the associative law for matrices holds. An explanation that makes the calculations in the above proof superfluous and shows the raw power of abstract mathematics: suppose that and are linear maps. Then and are both linear maps from , such that
for every . How does this relate to the associative law for matrix multiplication?

3.5 The inverse matrix

You are allowed to divide by a number provided it is . Does it makes sense to divide by matrices?
It does, but there are some matrices that correspond to the number that we are not allowed to divide by.
Let and be matrices. Show that
and
implies that .
An matrix is called invertible, if there exists an matrix , such that
In this case, is called the inverse matrix of and denoted .
As a reality check, you should convince yourself that the matrix
is not invertible. In fact, here the associative law from Theorem 3.17 is incredibly useful: if is an invertible matrix with inverse matrix and , then :
Show that a quadratic matrix with a column or row consisting entirely of zeros cannot be invertible.
Suppose that
with . Prove that is invertible with
When is a quadratic diagonal matrix invertible? Look first at the case:
The inverse matrix can be computed in numpy:
The inverse matrix enters the picture when solving linear equations with unknowns:
can be rewritten using matrix notation as
or more compactly as .
If is invertible, then the associative law gives the following:
The inverse matrix gives the solution to the linear equations just by one matrix multiplication!
The system of linear equations
can be rewritten using matrix multiplication to
where
Here is invertible and
One simple matrix multiplication
shows the solution we expect from (3.14) .
The product of two invertible matrices (when this makes sense) is an invertible matrix. This is the content of the following result.
The product of two invertible matrices and is invertible and .
We must check that
Let us check the first condition using the associative law:
where denotes the identity matrix. The condition is verified in the same way.
We have defined a matrix to be invertible if there exists a matrix , such that and . Suppose that only . Can we then conclude that ?
Find the mistake in the argument below.
Suppose that . Then for every we have . Therefore for every and we have proved that .
Let
Compute the powers for i.e., . Now let
where . Show that is invertible, and
Compute .
Do you see a way of generalizing this computation to matrices with a property shared by the matrix above?

3.5.1 Well, how do I find the inverse of a matrix?

Finding the inverse of a matrix or deciding that the matrix is not invertible is a matter of solving systems of linear equations.
Given an matrix , we need to see if there exists an matrix , such that
where is the identity matrix of order . We can do this by computing the columns of . From the definition in (3.15) , the -th column of must satisfy
This follows from the definition of matrix multiplication!
The identity in (3.16) is a system of linear equations in unknowns. The unknowns are the entries in the -th column of the inverse matrix (if it exists).
Suppose that is a matrix. Then the inverse matrix (if it exists) can be computed from the systems of linear equations below.
Writing
for the first and second columns, the systems of linear equations can be written as
where
A concrete example along with a useful way of keeping track of the computation is presented in the video below.
Compute the inverse of the matrix
by employing the method of solving linear equations above. Explain the steps in your computation. You may find it useful to collect inspiration from the video in Example 3.30 .

3.6 The transposed matrix

The transpose of an matrix is the matrix given by
where and . As an example,
Notice also that for an arbitrary matrix .
Let be an matrix and an matrix. Then
By definition . This entry is given by row-column multiplication of the -th row in and the -th column in , which is the row-column multiplication of the -th row in and the -th column in .
Let be a quadratic matrix. Prove that is invertible if and only if is invertible.
In the sage window below, you are supposed to experiment a bit by entering an arbitrary matrix and studying the quadratic matrix . Is there anything special about this product? Press the Further explanation button below the sage window to display the rest of the exercise after(!) you have completed your experimentation.
Further explanation
A quadratic matrix is called symmetric if . Prove that
is a symmetric matrix, where is an arbitrary matrix.

3.7 Symmetric matrices

A (quadratic) matrix is called symmetric if . Visually, this means that is symmetric around the diagonal like the matrix
but not like the matrix
Show that
is a symmetric matrix, when is a symmetric matrix and is an arbitrary matrix. Both matrices are assumed quadratic of the same dimensions.
If is a symmetric matrix, we define the function given by
This definition is rather compact. Let us consider the following example for .
Then
You are also encouraged to watch the short video below for an example with concrete numbers.
Inside the set of the symmetric matrices we find two very important subsets of matrices: the positive definite and the positive semi-definite matrices. They correspond to positive and non-negative real numbers.

3.7.1 Positive definite matrices

A symmetric matrix is called positive definite if
for every . Probably the first example of a positive definite -matrix one thinks of is being the identity matrix. Here
Of course, here if and only if or .
Give examples of (non-zero) and matrices that are positive definite and ones that fail to be positive definite.
When is a diagonal matrix positive definite?
Let be a symmetric matrix. Show that is not positive definite if .

3.7.2 Positive semi-definite matrices

A symmetric matrix is called positive semi-definite if
for every . A positive definite matrix is positive semi-definite. Probably the first example of a non positive definite, but positive semi-definite -matrix one thinks of is being the zero matrix. Here
for every .
Give an example of a non-zero matrix that is positive semi-definite, but not positive definite.
When is a diagonal matrix positive semi-definite?

3.7.3 Symmetric reductions

As you probably have noticed, it is rather straightforward to see when a diagonal matrix is positive (semi)definite. For a general symmetric matrix, one needs to reduce to the case of a diagonal matrix. This is done using the following result.
Let be a symmetric matrix and an invertible matrix. Then is positive (semi) definite if and only if
is positive (semi) definite.
Every vector is equal to for a unique , since is invertible. Why? The upshot is that the equation
can be solved by multiplying both sides by giving
So we get
This computation shows that is positive (semi) definite if is positive semi-definite. The same reasoning with shows that is positive (semi) definite if is positive (semi) definite.
Notice that it is important that only happens when .
Let
be a diagonal matrix. What conditions must the diagonal entries and satisfy in order for to be positive definite?
Let
denote a symmetric matrix, where . Let
Show that is invertible and compute
Use this to show that is positive definite if and only if and .
Let be the function defined by
Show that for every .