2Linear equations

Modern mathematical terminology may seem abstract, but a lot of it comes from equation solving. We will talk about linear equations in this chapter to motivate the concept of matrices in the next chapter.

Linear equations are equations, where the unknowns only appear to the first power. For example, $x^2 + x + 1 = 0$ is not a linear equation in the unknown $x$ , since $x$ to the second power ( $x^2$ ) appears in the equation, whereas $2 x - 3 = 1$ is. We may also consider several linear equations with several unknowns, such as

$\begin{aligned} x + y + z &= 3\\ x - y + z &= 1\\ x + y - z &= 1 \end{aligned}\tag{2.1}$

consisting of three linear equations with the three unknowns $x$ , $y$ and $z$ .

Try to come up with a solution to (2.1) i.e., find numbers $x, y, z$ satisfying all three equations. Do not use a computer. Is there more than one solution?

Write down two linear equations with two unknowns, which do not have a solution.

Do the exercise above, before you evaluate the Sage code below, which uses the dreaded solve function. The solve function should always be used as a last resort.

2.1 One linear equation with one unknown

Very simple rules apply when solving linear equations.

Consider as an example the linear equation $2 x - 3 = 1$ in the unknown $x$ . Solving this equation amounts to reducing to an expression $x=$ a number. This is called isolating $x$ . The process is very mechanical:

$\begin{aligned} 2 x - 3 &= 1\\ &\Updownarrow\\ 2 x - 3 + 3 &= 1 + 3\\ &\Updownarrow\\ 2 x &= 4\\ &\Updownarrow\\ \left(\frac{1}{2}\right) 2 x &= \left(\frac{1}{2}\right) 4\\ &\Updownarrow\\ x &= 2 \end{aligned}$ If you look closely, you will see that we have used the rules

$\begin{aligned} &a=b\qquad \iff \qquad a + c = b + c\\ &a=b\qquad \iff \qquad t a = t b, \end{aligned}$ where $a, b, c$ are numbers and $t$ is a number $\neq 0$ .

Point out the mistake(s) in the argumentThis teaser was presented at the workshop for new teaching assistants, August 2020. below showing that $2 = 1$ .

$\begin{aligned} a &= b \iff \\ a^2 &= a b \iff\\ a^2 - b^2 &= a b - b^2 \iff \\ (a + b)(a - b) &= b (a - b) \iff\\ a + b &= b \iff \\ 2 b &= b \iff \\ 2 &= 1. \end{aligned}$

A saline solution is a mixture of $0.9$ % sodium chloride in water. Suppose that you have $2$ liters of water containing $9$ % sodium chloride. How many liters of distilled water ( $0$ percent sodium chloride) do you need to add to get a saline solution.

$4.5$ liter

$10$ liter

$18$ liter

Diophantus's youth lasted $1/6$ of his life. He grew a beard after $1/12$ more. After $1/7$ more he got married. Five years later he had a son. The son lived half as long as the father and Diophantus died four years after the son. At what age did Diophantus die?

Link/Hint

You can read about Diophantus and the solution to the puzzle in the Wikipedia entry about him. Please try solving the problem on your own first.

2.2 Several linear equations with several unknowns

The linear equation $2 x - 3 = 1$ has only one unknown with the unique solution $x=2$ . If one linear equation has more than one unknown, then it has infinitely many solutions. Consider as an example the linear equation $2 x - 3 y = 1$ with the unknowns $x$ and $y$ . Using the procedure as before, we get

$\begin{aligned} 2 x - 3 y &= 1\\ &\Updownarrow\\ x &= \frac{1}{2} + \frac{3}{2} y \end{aligned}$ Here we are free to choose $y$ in infinitely many ways giving infinitely many solutions $(x, y)\in \mathbb{R}^2$ .

2.2.1 Several equations

Several equations with several unknowns also make sense. Consider

$\begin{aligned} x + y &= 3\\ 2 x - 3 y &= 1 \end{aligned}$ Two numbers $x$ and $y$ form a solution $(x, y)\in \mathbb{R}^2$ if both equations are satisfied. From the example above, we know that

$x = \color{blue}{\frac{1}{2} + \frac{3}{2} y}. \tag{2.2}$ This can be inserted for $x$ in the first equation and we get

$3 = x + y = \color{blue}{\frac{1}{2} + \frac{3}{2} y} + y = \frac{1}{2} + \frac{5}{2} y.$ Here we end up with one linear equation in one variable $y$ . The solution is $y=1$ , which is inserted in the equation (2.2) giving $x = 2$ . Therefore the solution to the equations is $(x, y) = (2, 1)$ .

Kona coffee is a delicacy priced at $200$ kroner for $400$ grams. A standard $500$ gram bag of Arabica beans is priced at $60$ kroner.

A merchant wishes to mix coffee beans of these sorts aiming for a price of $75$ kroner for $400$ grams. Which one of the percentages below comes closest to the content of Kona coffee in the mixture?

$18%$

$5%$

$30%$

$12%$

2.3 Gauss elimination

When solving systems of several linear equations, it is natural to fix one of the equations, isolate an unknown and then insert in the other equations.

Let us study this procedure focusing on an example with two equations and three unknowns:

$\begin{aligned} x + 2 y + z &= 8\\ 2 x + y + z &= 7 \end{aligned}$ In the first equation we isolate $x = 8 - 2 y - z$ , which is then inserted into the second equation:

$2 x + y + z = 2(8 - 2 y - z) + y + z = - 3 y - z + 16 = 7 \implies -3 y - z = -9.$ It makes perfect sense to multiply the first equations by $2$ and subtract from the second equations. This operation gives

$-3 y - z = -9.$ It is not a coincidence that these two operations give the same result.

Suppose that

$\begin{aligned} a_1 x_1 + a_2 x_2 + \cdots + a_n x_n &= c_1\\ b_1 x_1 + b_2 x_2 + \cdots + b_n x_n &= c_2 \end{aligned}$ are two linear equations in the unknowns $x_1, \dots, x_n$ with $a_1\neq 0$ . The equation gotten by first isolating $x_1$ in the first equation and then inserting in the second equation is identical to the equation you get by adding the first equation multiplied by $-b_1/a_1$ to the second equation.

Isolating $x_1$ in the first equation inserted in the second equation gives the equation

$b_1\left(\frac{c_1}{a_1} - \frac{a_2}{a_1} x_2 - \cdots -\frac{a_n}{a_1} x_n\right) + b_2 x_2 + \cdots + b_n x_n = c_2 \tag{2.3}$ Adding $-b_1/a_1$ multiplied to the first equation to the second equation gives

$\left(b_2 - \frac{b_1 a_2}{a_1}\right) x_2 + \cdots + \left(b_n - \frac{b_1 a_n}{a_1}\right) x_n = c_2 - \frac{b_1}{a_1} c_1 \tag{2.4}$ Using basic arithmetic you can see that (2.3) can be rewritten to (2.4).

Multiplying an equation by a number and then adding to another equation is easier to handle than the method of isolating and inserting. We have showed above that they produce the same result. Below is an extended example.

We wish to solve the system of equations

$\begin{matrix} &2 x &+ &y &+ &z &= 7\\ &x &+ &2y &+ &z &= 8\\ &x &+ &y &+ &2 z &= 9 \end{matrix}. \tag{2.5}$

The first step is subtracting the third equation from the second:

$\begin{matrix} &2 x &+ &y &+ &z &= 7\\ &x &+ &2y &+ &z &= 8\\ &x &+ &y &+ &2 z &= 9 \end{matrix}\quad\iff \begin{matrix} &2 x &+ &y &+ &z &= 7\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix}$ Then we multiply the third equation by $2$ and subtract from the first:

$\begin{matrix} &2 x &+ &y &+ &z &= 7\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix} \quad\iff \begin{matrix} & &- &y &- &3 z &= -11\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix}$ Finally we add the second equation to the first:

$\begin{matrix} & &- &y &- &3 z &= -11\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix}\quad\iff \begin{matrix} & & & &- &4 z &= -12\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix}$ We have now reduced the original system of equations (2.5) to

$\begin{matrix} & & & &- &4 z &= -12\\ & & &y &- &z &= -1\\ &x &+ &y &+ &2 z &= 9 \end{matrix},$ where the first equation shows that $z = 3$ . Now $z = 3$ can be inserted into the second equation, giving $y - 3 = -1$ , which is solved by $y=2$ . Finally $y = 2$ and $z=3$ are inserted into the third equations giving the equation $x + 8 = 9$ , which is solved by $x=1$ .

One very important observation here is that $x=1, y=2$ and $z=3$ is the only solution to (2.5). This is a logical consequence of the bi-implication arrows $\iff$ throughout the above calculations.

The elimination or substitution method for solving systems of linear equations is old and well known. Sir Isaac Newton described in $1720$ the methods eloquently as follows.

And you are to know, that by each Æquation one unknown Quantity may be taken away, and consequently, when there are as many Æquations and unknown Quantities, all at length may be reduc'd into one, in which there shall be only one Quantity unknown.

The mathematical rockstar Carl Friedrich Gauss used the method to determine the orbit for the asteroid Pallas. The mathematical analysis of the observations lead him to the famous least squares method and a system of six linear equations with six unknowns.

The method is known today by the term Gaussian elimination even though Gauss was not the first to introduce it. In fact it appeared already in The Nine Chapters on the Mathematical Art, which is an ancient Chinese mathematics book compiled over several centuries from the 10th century BCE to the 2nd century CE. This book contains several practical problems and their solutions. An example is

There are three categories of corn. Three bundles of the first class, two of the second and one of the third make $39$ measures. Two of the first, three of the second, and one of the third make $34$ measures. Finally one of the first, two of the second and three of the third make $26$ measures. How many measures of graín are contained in one bundle of each class?

How many solutions does the system of equations below have?

$\begin{aligned} x + y + z &= 0\\ x - y + z &= 0\\ x + y - z &= 0 \end{aligned}$

None.

Precisely one.

Infinitely many.

How many solutions does the system of equations below have?

$\begin{aligned} x + y + z &= 0\\ x - y + z &= 0\\ 5 x + y +5z &= 0 \end{aligned}$

Precisely one.

Precisely two.

Infinitely many.

Find the solutions to

$\begin{matrix} & x &+ &3 y &+ &z &= &2\\ &-2 x &- &5 y &+ &3 z &= &4. \end{matrix}$ by expressing $x$ and $y$ in terms of $z$ i.e., isolate $x$ on the left hand side, such that

$\begin{aligned} x &= \dots\\ y &= \dots, \end{aligned}$ where $\dots$ indicate an expression only in the unknown $z$ .

Your enemy transmits secret codes $(x_1, x_2, x_3, x_4)$ consisting of four integers $x_1, x_2, x_3, x_4$ over the internet. He does not transmit the code itself but an encrypted version $(y_1, y_2, y_3, y_4)$ given by

$\begin{matrix} y_1 &= &2 x_1 &+ &x_2 &+ &3 x_3 &+ &4 x_4\\ y_2 &= &x_1 &+ &2 x_2 &+ &3 x_3 &+ &4 x_4\\ y_3 &= &3 x_1 &+ &3 x_2 &+ &x_3 &+ &x_4\\ y_4 &= &4 x_1 &+ &4 x_2 &+ &2 x_3 &+ &3 x_4 \end{matrix}.$ You have knowledge of the encryption method above and by listening in on a recent communication, you learn that the encryption $(15, 16, 12, 20)$ was sent. What was the original secret code before the encryption?

Extra credit

Suppose that you only know that the encryption scheme is

$\begin{matrix} y_1 &= &a_{11} x_1 &+ & a_{12} x_2 &+ &a_{13} x_3 &+ &a_{14} x_4\\ y_2 &= &a_{21} x_1 &+ &a_{22} x_2 &+ &a_{23} x_3 &+ &a_{24} x_4\\ y_3 &= &a_{31} x_1 &+ &a_{32} x_2 &+ &a_{33} x_3 &+ &a_{34} x_4\\ y_4 &= &a_{41} x_1 &+ &a_{42} x_2 &+ &a_{43} x_3 &+ &a_{44} x_4 \end{matrix},$ and that you have no knowledge of the numbers $a_{11}, \dots, a_{44}$ . How many transmissions do you need to know at the minimum to find these encryption numbers?

The diagram below shows a network of roads and $6$ intersections. Every road is labeled by a number indicating the average number of cars per hour on the road. Some of these numbers $f_1, \dots, f_7$ are unknowns. Write up a system of linear equations for finding $f_1, \dots, f_7$ .

Compute $f_1, f_2, f_3, f_4, f_5, f_6$ supposing that $f_1=200$ and $f_7=100$ .

This example relates to the famous Google page rank algorithm.

Suppose we have a very simple internet with only four webpages as depicted above with arrows indicating that a webpage links to another.

We wish to study traffic in this network in the sense that we let a random websurfer jump from a given webpage to another by selecting a link randomly.

If you look at the network without the punctured red arrow, it is almost clear the a random websurfer will spend $25$ % of the time uniformly in each of the four nodes.

However, if we introduce the puntured red arrow, then the percentages in each node are given by the linear equations above. Here it turns out that website $1$ only gets around $14$ % of the time (the other websites get double this time each).

Mentimeter

Page rank quiz

You may try out the python code below to simulate a random tour of the small internet in Example 2.13.

The list (or matrix)

A = [[0,1,0,0], [0,0,0,1], [1,1,0,0], [0,0,1,0]]

encodes the graph of links between the four nodes $0, 1, 2, 3$ . From $A$ you can see that $2$ links to $0$ and $1$ and that $0$ links to $1$ . The command

simulate(0, 1000)

simulates a random surf with $1.000$ clicks starting in node $0$ .

The linear equations really seem to give the right result!

2.4 Polynomials

Before going further into examples of linear equations we need to introduce (non-linear) functions called polynomials. A polynomial of degree $n$ is a function $f:\mathbb{R}\rightarrow \mathbb{R}$ of the form

$f(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0, \tag{2.6}$ where $a_0, \dots, a_n$ are real numbers and $a_n\neq 0$ . We call $a_0, \dots, a_n$ the coefficients of $f$ . The degree of the polynomial $f$ is denoted $\deg(f)$ . As an example,

$x^3 - 2 x + 17$ is a polynomial of degree $3$ with

$\begin{aligned} a_3 &= 1\\ a_2 &= 0\\ a_1 &= -2\\ a_0 &= 17. \end{aligned}$

In addition to the polynomials defined in (2.6) with $a_n\neq 0$ , we also view the function $f(x) = 0$ as a polynomial, called the zero polynomial. The zero polynomial does notAll its coefficients are zero! have a degree.

The set of all polynomials is denoted $\mathbb{R}[x]$ , so that for example it makes sense to write

$x^2 - 5 x + 6\in \mathbb{R}[x].$ It is probably the most natural functions from $\mathbb{R}$ to $\mathbb{R}$ you can come up with. If you look at (2.6), you will see that the output is formed by using addition and multiplication (by $x$ and selected real numbers).

You can compute with polynomials treating the variable $x$ as a number. For example,

$(3 x^2 + 2 x + 1) (2 x + 1) = 6 x^3 + 7 x^2 + 4 x + 1.$ In general a polynomial of degree $m$ times a polynomial of degree $n$ is a polynomial of degree $m+n$ .

In the sage window below we encounter for the first time the sympy library. The input format and commands for handling polynomials should be clear from the context.

You have already seen polynomials of degree one. They have the form

$f(x) = a x + b,$ where $a$ and $b$ are real numbers and $a\neq 0$ . Similarly polynomials of degree two are called quadratic polynomials. They look like

$f(x) = a x^2 + b x + c,$ where $a, b$ and $c$ are real numbers and $a\neq 0$ .

To get a feeling for the behavior of polynomials you should experiment in the sage window below. Try varying the degree and the coefficients of the polynomial in the plot. Also adjust the plot interval for the right view.

Suppose that

$f(x) = a x^2 + b x + c.$ To compute $f(x)$ it seems that you need $3$ multiplications ( $a\cdot x\cdot x$ and $b\cdot x$ ) and $2$ additions. Can you compute $f(x)$ with only $2$ multiplications and $2$ additions?

Try to generalize to the computation of $f(x)$ , where $f$ is a polynomial

$f(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0,$ of degree $n$ (you should only need $n$ multiplications and $n$ additions here).

2.4.1 Polynomial division

Division is sometimes referred to as long division when focusing on the method for division. Let us look at the situation for integers first.

The remainder of $14$ divided by $4$ is $2$ , since

$14 = 3\cdot 4 + 2.$ Here the remainder $2$ is strictly less than the divisor $4$ .

For polynomials we have a similar situation, where the degree is taken into account. For example, the remainder of $x^3 + x + 1$ divided by $x^2 + x + 1$ is $x+2$ , since

$x^3 + x + 1 = (x-1)(x^2 + x + 1) + (x+2). \tag{2.7}$ Here the degree of the remainder $1$ is strictly less than the degree of the divisor $2$ .

The Python library sympy contains a wealth of functions for symbolic mathematics. In the window below, it is shown how the polynomial division (2.7) is computed using the Polynomial Manipulaton section of the sympy documentation.

The (division) algorithm for carrying out (long) division of polynomials is explained by an example in the video below.

Watch the five minute video above and carry out (do not use a computer) the polynomial division alluded to in (2.7).

The general result about division of polynomials is given below.

Let $d(x)\in \mathbb{R}[x]$ be a non-zero polynomial. Then for every polynomial $f(x)\in \mathbb{R}[x]$ , there exists polynomials $q(x), r(x)\in \mathbb{R}[x]$ , such that

$f(x) = q(x) d(x) + r(x), \tag{2.8}$ where $r(x) = 0$ or $\deg(r(x)) < \deg(d(x))$ .

We will prove this using induction on $n = \deg(f)$ . Suppose that

$f(x) = a_n x^n + \cdots\qquad\text{and}\qquad d(x) = b_m x^m + \cdots$ In general if $\deg(d(x)) = m > n$ , then

$f(x) = 0\cdot d(x) + f(x)$ satisfies the assumptions for the identity in (2.8) with $q(x) = 0$ and $r(x) = f(x)$ .

If $m\leq n$ , then $f(x) - a_n b_m^{-1} x^{n-m} d(x)$ is a polynomial of degree $<n$ . So by induction we may find polynomials $q_0(x)$ and $r_0(x)$ , such that

$f(x) - a_n b_m^{-1} x^{n-m} d(x) = q_0(x) d(x) + r_0(x).$ Therefore

$f(x) = (q_0(x) + a_n b_m^{-1} x^{n-m})d(x) + r_0(x)$ giving the desired result with $q(x) = q_0(x) + a_n b_m^{-1} x^{n-m}$ and $r(x) = r_0(x)$ .

2.4.2 Roots of polynomials

A real number $\alpha\in \mathbb{R}$ is called a root of the polynomial $f(x)\in \mathbb{R}[x]$ if $f(\alpha) = 0$ . This is a very fundamental definition. It is mirrored beautifully in the following result.

A real number $\alpha$ is a root of the polynomial $f(x)\in \mathbb{R}[x]$ if and only if

$f(x) = q(x) (x-\alpha),$ for some polynomial $q(x)\in \mathbb{R}[x]$ .

By Theorem 2.18, we may write

$f(x) = q(x)(x-\alpha) + r(x), \tag{2.9}$ where $r(x)= 0$ or $r(x)$ is a non-zero polynomial of degree zero i.e., a non-zero constant. Now the result follows, since $f(\alpha) = q(\alpha)(\alpha - \alpha) + r(\alpha) = r(\alpha)$ using (2.9).

Is there an easy way of deciding if a polynomial $d(x) = a x + b$ of degree one divides a polynomial $f(x)$ without performing the (long) division of $f(x)$ by $d(x)$ . Here divides means that $f(x) = q(x) d(x)$ for some polynomial $q(x)$ .

A quadratic polynomial

$a x^2 + b x + c$ has at most two roots given by the formula (one root for $+$ and one for $-$ in $\pm$ below)

$\frac{-b\pm \sqrt{b^2 - 4 a c}}{2 a}, \tag{2.10}$ if its discriminant $b^2 - 4 a c$ is $\geq 0$ .

Deriving the formula (2.10) comes from a classical algebraic trick called completing the square. Looking at the quadratic equation $a x^2 + b x + c = 0$ , what bothers us is the term $b x$ . If $b=0$ we could solve the equation rewriting to

$x^2 = -\frac{c}{a}$ and then taking square roots. The first step in this direction is rewriting the equation

$a x^2 + b x + c = 0$ to

$x^2 + \frac{b}{a} x = -\frac{c}{a}. \tag{2.11}$ We would like to add a number $d^2$ to both sides of (2.11) so that the left hand side comes to look like

$(x + d)^2 = x^2 + 2 x d + d^2. \tag{2.12}$ This is what is called completing the square.

Comparing the left hand side of (2.11) with the right hand side of (2.12), we find that

$d = \frac{b}{2 a}$ works. Therefore (2.11) implies

$\left( x + \frac{b}{2a}\right)^2 = -\frac{c}{a} + \left(\frac{b}{2 a}\right)^2.$ This identity can be rewritten into the formula (2.10) for solving the quadratic equation.

For polynomials of degree three (cubic polynomials) there is a formula, but these days nobody remembers it. Also for polynomials of degree four (quartic polynomials) there is a formula. But for polynomials of degree five (quintic polynomials) and up, one can prove that a formula cannot exist!

An exceedingly important result is quoted and proved below: the degree of a polynomial is an upper bound for its number of roots.

A non-zero polynomial $f(x)\in \mathbb{R}[x]$ of degree $n > 0$ can have at most $n$ roots.

We will prove this by induction starting with $n = 1$ . Here $f(x) = a x + b$ for $a, b\in \mathbb{R}[x]$ and

$f(\alpha) = 0\iff \alpha = - a^{-1} b.$ Therefore $f(x)$ has precisely one root. Suppose now that we have proved that polynomials of degree $n$ has at most $n$ roots. Assume that $f(x)$ is a polynomial of degree $n + 1$ . If $f(x)$ has no roots, we are done with the proof. Suppose that $f(\alpha) = 0$ i.e., $\alpha$ is a root in $f$ . Then

$f(x) = q(x) (x-\alpha)$ by Proposition 2.19. Here $q(x)$ has to be a polynomial of degree $n$ and therefore by induction, $q(x)$ has at most $n$ roots. However, if $f(\beta) = q(\beta) (\beta - \alpha) = 0$ , then either $\beta = \alpha$ or $q(\beta) = 0$ . We have proved that $f(x)$ cannot have more than $n+1$ roots.

Theorem 2.21 has a few interesting consequences. First it implies that two identical polynomials i.e., $f(x) = g(x)$ for every $x\in \mathbb{R}$ must have the same coefficients.

Secondly if two polynomials $f(x)$ and $g(x)$ of degree $n$ satisfy $f(x_i) = g(x_i)$ for distinct points $x_1, \dots, x_{n+1}$ , then $f(x) = g(x)$ .

In Remark 2.22 it is stated that if two polynomials $f(x)$ and $g(x)$ of degree $n$ satisfy $f(x_i) = g(x_i)$ for distinct points $x_1, \dots, x_{n+1}$ , then $f(x) = g(x)$ . How does this follow from Theorem 2.21?

It might happen that a polynomial of degree $n$ has precisely $n$ roots, but it could have less or even no roots: the polynomials

$x^2 + 1, x^4 + 1, x^6 + 1, \dots$ have no roots, whereas for example

$x^2 - 2 x + 1$ is a quadratic polynomial with only one root. However polynomials of degree $1, 3, 5, \dots$ always have at least one root.

A polynomial of odd degree always has a root.

The proof of this result is beyond our scope now and will have to wait for tools from analysis (Chapter 5).

Compute (without using a computer!) the roots of the quartic

$x^4 - 5x^2 + 6.$

Give an example of a polynomial of degree $17$ with precisely one root.

Suppose that $\alpha, \beta$ are two roots of the quadratic polynomial

$f(x) = x^2 + b x + c.$ How can $b$ and $c$ be computed in terms of $\alpha$ and $\beta$ ? Show concretely how this can be applied to the polynomial $g(x) = x^2 - 5 x + 6$ : if you know that $g(2) = 0$ how can you easily find the other root?

Show that $f(x) = (x-\alpha)(x-\beta)$ and use this.

2.5 Applications to polynomials

A line in the plane is given by its equation $y = a x + b$ , where $a$ is the slope and $b$ is the intersection with the $y$ -axis. Two lines in the plane are either parallel or intersect in a single point.

The two lines $y = x+1$ and $y=-x+2$ have a single point of intersection. Compute this point.

Give an example of two parallel lines and their equations.

Through two (distinct) points $(x_1, y_1)$ and $(x_2, y_2)$ with $x_1\neq x_2$ passes a unique line

You can find the equation for this line by solving two equations with two unknowns $a$ and $b$ :

$\begin{aligned} x_1 a + b &= y_1\\ x_2 a + b &= y_2 \end{aligned}$ We might as well apply Gauss elimination to solve this system. First we subtract the second equation from the first. This gives $(x_1 - x_2) a = y_1 - y_2$ . Therefore

$a = \frac{y_1 - y_2}{x_1 - x_2}.$ Inserting this $a$ in the first equation we get

$b = \frac{x_1 y_2 -x_2 y_1}{x_1 - x_2}.$ We can also in a quite explicit way just write

$y = f(x) = y_1 \frac{x - x_2}{x_1 - x_2} + y_2 \frac{x - x_1}{x_2 - x_1}. \tag{2.13}$ The function $f(x)$ in (2.13) is a polynomial of degree one with $f(x_1) = y_1$ and $f(x_2) = y_2$ .

In almost the same way we may find a unique quadratic polynomial

$y = a x^2 + b x + c$ through three points $(x_1, y_1), (x_2, y_2)$ and $(x_3, y_3)$ with distinct $x$ -values:

Here we end up with three linear equations in the unknowns $a, b$ and $c$ :

$\begin{aligned} x_1^2 a + x_1 b + c &= y_1\\ x_2^2 a + x_2 b + c &= y_2\\ x_3^2 a + x_3 b + c &= y_3 \end{aligned}\tag{2.14}$ It is not immediately obvious that this system of equations has a solution. But watch the following trick evolve.

We may explicitly construct the quadratic polynomial passing through the three points as

$\begin{aligned} y = f(x) = &y_1 \frac{(x - x_2)(x-x_3)}{(x_1 - x_2)(x_1-x_3)} + y_2 \frac{(x - x_1)(x - x_3)}{(x_2 - x_1)(x_2-x_3)}\\ &+ y_3 \frac{(x - x_1)(x - x_2)}{(x_3 - x_1)(x_3-x_2)} \end{aligned}\tag{2.15}$ Take a moment and verify that $f(x_1) = y_1, f(x_2) = y_2$ and $f(x_3)= y_3$ . This also proves that the system of equations in (2.14) can be solved.

Notice in (2.15) that

$y = y_1 L_1(x) + y_2 L_2(x) + y_3 L_3(x),$ where (for example) $L_1$ is a polynomial of degree two satisfying

$L_1(x_1)=1,\quad L_1(x_2)=0, \quad\text{and}\quad L_1(x_3)=0.$ What about $L_2$ and $L_3$ with respect to $x_1, x_2$ and $x_3$ ?

Compute the polynomial you get when you apply (2.15) to $x_1 = 1, x_2 = 2, x_3= 3$ and $y_1 = 1, y_2 = 2, y_3 = 3$ . How do you explain this result in terms of the points $(x_1, y_1), (x_2, y_2)$ and $(x_3, y_3)$ plotted in plane?

The natural generalization is that there exists a unique polynomial of degree $\leq n$ passing through $n+1$ points $(x_1, y_1), \dots, (x_{n+1}, y_{n+1})$ with distinct $x$ -values.

The rather miraculous trick above in (2.15) is called Lagrange interpolation and can be generalized to polynomials of arbitrary degree. Below is an example of five points defining a unique polynomial of degree four.

2.5.1 The magic of Lagrange polynomials

Let us explain with a simple numerical example what happens in (2.15). Suppose we wish to find a polynomial $f(x) = a_0 + a_1 x + a_2 x^2$ through the points

$(1, 2), \qquad (2, 3)\qquad \text{and} \qquad (3, 5).$ More precisely we wish to find numbers $a_0, a_1$ and $a_2$ , such that

$\begin{aligned} f(1) &= a_0 + a_1 + a_2 = 2\\ f(2) &= a_0 + 2 a_1 + 4 a_2 = 3\\ f(3) &= a_0 + 3 a_1 + 9 a_2 = 5. \end{aligned}$

This is a system of three linear equations which in this case has a unique solution in $a_0, a_1$ and $a_2$ .

We may, however, attack this problem in another way. Suppose that $L_1(x), L_2(x)$ and $L_3(x)$ are polynomials of degree at most two, such that

$\begin{aligned} L_1(1) = 1\qquad L_1(2) = 0 \qquad L_1(3) = 0\\ L_2(1) = 0\qquad L_2(2) = 1 \qquad L_2(3) = 0\\ L_3(1) = 0\qquad L_3(2) = 0 \qquad L_3(3) = 1 \end{aligned}$

Then

$f(x) = 2 L_1(x) + 3 L_2(x) + 5 L_3(x)$ really is the polynomial we wish to find. The insight is that these $L_1(x), L_2(x)$ and $L_3(x)$ can be explicitly written down as

$\begin{aligned} L_1(x) &= \frac{(x-2)(x-3)}{(1-2)(1-3)}\\ L_2(x) &= \frac{(x-1)(x-3)}{(2-1)(2-3)}\\ L_3(x) &= \frac{(x-1)(x-2)}{(3-1)(3-2)}. \end{aligned}$

Example 2.31 can be generalized: suppose we have $n$ numbers

$x_1, x_2, \dots, x_n. \tag{2.16}$ Then these numbers give $n$ polynomials each of degree $n-1$ :

$L_i(x) = \frac{1}{C_i} (x-x_1) \cdots (x-x_{i-1})(x-x_{i+1}) \cdots (x - x_n),$ where $C_i = (x_i-x_1) \cdots (x_i-x_{i-1})(x_i-x_{i+1}) \cdots (x_i - x_n)$ for $i = 1, \dots, n$ .

The polynomial $L_i(x)$ is called the $i$ -th Lagrange basis polynomial associated to the $n$ numbers $x_1, \dots, x_n$ . It satisfies $L_i(x_1) = \cdots = L_i(x_{i-1}) = 0$ , $L_i(x_i) = 1$ and $L_i(x_{i+1}) = \cdots = L_i(x_n) = 0$ i.e., $L_i(x)$ is equal to zero evaluated at all of the numbers $x_1, \dots, x_n$ except at $x_i$ where it evaluates to $1$ .

The Lagrange basis polynomials allow us to construct a polynomial $f$ of degree $\leq n$ through $n+1$ points $(x_1, y_1), \dots, (x_{n+1}, y_{n+1})$ i.e., a polynomial $f$ such that

$\begin{aligned} f(x_1) &= y_1\\ &\vdots\\ f(x_{n+1}) &= y_{n+1} \end{aligned}$ simply as

$f(x) = y_1 L_1(x) + \cdots + y_{n+1} L_{n+1}(x).$ However, $f(x)$ does not have to have degree $n$ . For example, it could come out as a line through three points $(x_1, y_1), (x_2, y_2)$ and $(x_3, y_3)$ (see Exercise 2.30).

Compute $a_0, a_1, a_2, a_3\in \mathbb{R}$ so that

$\begin{aligned} f(-2) &= -1\\ f(-1) &= 1\\ f(1) &= 1\\ f(2) &= 1, \end{aligned}$ where

$f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3.$ You can do this either by Lagrange interpolation or by solving linear equations. Which one do you prefer?

Can you predict the next number in the sequence starting with

$15,\quad 34,\quad 65,\quad 111,\quad 175,\quad 260,\quad 369? \tag{2.17}$

This question was posedThanks to Tobias Bendsen Poulsen for notifying me about this. by the tutors in a class session for new computer science students. Let us put the sequence (2.17) inside a table like

$\def\arraystretch{1.5} \begin{array}{c|ccccccc} n & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ \hline f(n) & 15 & 34 & 65 & 111 & 175 & 260 & 369 \end{array},$

where $f:\mathbb{N}\rightarrow \mathbb{N}$ is the secret function responsible for the sequence. We would like to compute $f(8)$ . Assuming that the $f(n)$ is a polynomial function, we may simply compute the unique polynomial of degree $\leq 6$ through the $7$ points

$(1, 15),\quad (2, 34),\quad (3, 65),\quad (4, 111),\quad (5, 175),\quad (6, 260),\quad (7, 369).$

We know how to do this either by solving linear equations or computing with Lagrange polynomials. It turns out that Sage has built in functions helping us here.

Press the button to see what next number is in the sequence (computed using the secret polynomial). See also the description of Neville's algorithm in Wikipedia for an easier approach to computing $f(8)$ .

2.6 Shamir secret sharing

Lagrange interpolation is used in cryptography in Shamir's secret sharing. Secret sharing is important in many practical situations. Here is an example quoted from Wikipedia:

A company needs to secure their vault's passcode. They could encrypt it, but what if the beholder of the secret key is unavailable or turns rogue?
One needs to distribute the secret. This is where SSS comes in. It can be used to encrypt the vault's passcode and generate a certain number of shares, where a certain number of shares can be allocated to each executive within the company. Now, only if they pool their shares can they unlock the vault. The threshold can be appropriately set for the number of executives, so the vault is always able to be accessed by the authorized individuals. Should a share or two fall into the wrong hands, they couldn't open the passcode unless the other executives cooperated.

The mathematics that takes care of this is surprisingly simple. Suppose the secret is the number $a_0$ . Then we construct the polynomial

$f(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_m x^m \tag{2.18}$ for some other numbers $a_1, \dots, a_m$ . We know that this polynomial is uniquely given by its values in $m+1$ distinct numbers (see Remark 2.22). So if there are $n$ trusted people we could distribute the shares

$(1, f(1)), (2, f(2)), \dots, (n, f(n))$ to them. Here we suppose that $n > m$ . In this setting, if there are less than $m+1$ of the people present they cannot open the vault. If $m+1$ or more people are present they can reconstruct the polynomial in (2.18), find the secret code $a_0$ and open the vault.

You are in a study group consisting of four people. The professor has decided that you submit your project using a secret code that is distributed to the group members with Shamir secret sharing. At least three group members need to agree on submission.

On the day of the deadline three group members with shares

$(1, 7035),\quad (2, 19748)\quad\text{and}\quad (3, 39373)$ are present. What is the secret code they may use to submit their project?

2.7 Fitting data

Given a data set

$\mathcal{D} = \{ (x_1,y_1), (x_2,y_2), \dots, (x_n, y_n) \}$ one would often like to find a model (i.e. some function) that describes the data well. With Lagrange interpolation we can find a polynomial $f$ fitting the sample data $\mathcal{D}$ perfectly, i.e. satisfying $f(x_i) = y_i$ for $i = 1, 2, \dots n$ . Is $f$ an optimal model? For the given data set it seems so, but we have been a bit imprecise in formulating the goal of a model.

Actually, we are not very interested in modeling the data at hand with extreme precision. What we want is a model that fits new data well. Let us look at a concrete example.

Consider the data set

$\begin{aligned} \mathcal{D} = \{ {}& (0, 0.06), (0.5, 0.33), (1, 0.56), (1.5, 1.35), (2, 1.48), (2.5, 1.15), \\ &(3, 1.45), (3.5, 1.12), (4, 0.68), (4.5, 0.22), (5, -0.10) \} \end{aligned}$

The data points $(x_i,y_i)$ , $i=1,2,\dots,11$ , were generated as $(x_i, p(x_i) + \epsilon_i)$ where $p(x) = -0.2 x^2 + x$ is a quadratic polynomial and $\epsilon_i \in [-0.4,0.4]$ is a random number to simulate noise. The polynomial $p$ is the best possible model for unknown data as there will always by noise that can not be modeled. In real life $p$ is what needs to be modeled based on the available data.

In the figure below is a fit with a degree $2$ and degree $10$ polynomial respectively. As we see, the degree $2$ polynomial is pretty close to the target $p$ compared to the degree 10 polynomial that nevertheless fits the data $\mathcal{D}$ perfectly. Generally a simple model is preferred over a complex, as the latter will have a tendency to fit noise. This phenomenon is called overfitting and is an extremely important topic.

An interactive version of this illustration with a little more bells and whistles can be found here.

Fitting a degree $2$ polynomial (in red) and a degree $10$ polynomial (in blue) to the sample data $\mathcal{D}$ . The target function $p$ is the dashed curve. We see that the simple quadratic fit is much closer to the target function and hence performes better on new data.

In later chapters we will see how the degree two polynomial fit was obtained. This is a nice example of a convex optimization problem.