Differentiations of Multivariate Functions

阿新 • • 發佈：2022-03-26

Limit Points,Open and Closed Set

Neiborhoods and Open Sets

Deleted neighborhood

For \(\delta>0\) and \(a \in \mathbb{R}\), the set \(\{x \in \mathbb{R}|0<| x-a \mid<\delta\}\) is called a deleted \(\delta\) - neighborhood of the point \(a .\)

Interior Point

Let \(x \in S\). If \(\exists\) an open neighborhood \(U\)

of \(x\), such that \(S\),then we say that \(x\) is an interior point of \(S\).

Open Set

A set \(S\subset \mathbb{R}\) is called open if\(\forall x \in S\) is an interior point of \(S\).

Limit Point and Closed Sets

Point of accumulation

Given a set \(S \subset \mathbb{R}\), a point \(I \in \mathbb{R}\)

is called a limit point (極限點) or point of accumulation(聚點) of the set \(S\), if every deleted \(\delta\)-neighborhood of I contains one or more points of \(S\).

Closed Set

A Set \(S\) is called closed if it contains all its limit points.

Jacobian Matrix and Directional Derivatives

Gradient of composed function

Notation: For a \(n\)-variable function \(f=f\left(x_{1}, x_{2}, \ldots, x_{n}\right)\), we have defined its gradient:

\[\operatorname{grad} f=\nabla f=\frac{\partial f}{\partial\left(x_{1}, \cdots, x_{n}\right)}=\left(\frac{\partial f}{\partial x_{1}}, \cdots, \frac{\partial f}{\partial x_{n}}\right) \]

Gradients are higher dimensional derivatives.
Recall that, in the one-variable case, if \(f=f(x)\), and \(x=\varphi(t)\), then the composed function is

\[f=f(x)=f(\varphi)=f(\varphi(t)) \]

We have the standard chain rule:

\[\frac{d f}{d t}=\frac{d f}{d x} \frac{d \varphi}{d t} \]

Question: For \(f=f\left(x_{1}, \cdots, x_{n}\right)\) and \(x_{1}=\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots\) \(x_{n}=\varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\), consider the composed function

\[f=f\left(x_{1}, \cdots, x_{n}\right)=f\left(\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots, \varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\right) \]

Then what is \(\nabla f\) in terms of \(y_{1}, \cdots, y_{m} ?\) l.e.,

\[\frac{\partial f}{\partial\left(y_{1}, \cdots, y_{m}\right)}=\left(\frac{\partial f}{\partial y_{1}}, \cdots, \frac{\partial f}{\partial y_{m}}\right)=? \]

For short, we have

\[\frac{\partial f}{\partial\left(y_{1} \cdots y_{m}\right)}=\frac{\partial f}{\partial\left(x_{1} \cdots x_{n}\right)} \frac{\partial\left(\varphi_{1} \cdots \varphi_{n}\right)}{\partial\left(y_{1} \cdots y_{m}\right)} \]

where

\[\frac{\partial\left(\varphi_{1} \cdots \varphi_{n}\right)}{\partial\left(y_{1} \cdots y_{m}\right)}=\left(\begin{array}{ccc} \frac{\partial \varphi_{1}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{1}}{\partial y_{m}} \\ \frac{\partial \varphi_{2}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{2}}{\partial y_{m}} \\ \vdots & & \vdots \\ \frac{\partial \varphi_{n}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{n}}{\partial y_{m}} \end{array}\right) \]

is called the Jacobian matrix.

First order differential forms

Invariance of differential forms: For \(f=f\left(x_{1}, \cdots, x_{n}\right)\), we have

\[d f=\frac{\partial f}{\partial x_{1}} d x_{1}+\cdots+\frac{\partial f}{\partial x_{n}} d x_{n} \]

Now, if we compose \(f\) with \(x_{1}=\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots\), \(x_{n}=\varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\), and substitute

\[d x_{i}=\frac{\partial \varphi_{i}}{\partial y_{1}} d y_{1}+\cdots+\frac{\partial \varphi_{i}}{\partial y_{m}} d y_{m} \]

we get

\[d f=\cdots=\frac{\partial f}{\partial y_{1}} d y_{1}+\cdots+\frac{\partial f}{\partial y_{m}} d y_{m} \]

Directional derivatives

A function \(f=f\left(x_{1}, \cdots, x_{n}\right)\) is defined on \(D \subset E^{n}\). For a point \(\vec{a}=\left(a_{1}, \cdots, a_{n}\right) \in D^{\circ}\), take any direction \(\vec{v} \neq 0\), and consider the rate of change of \(f\) along \(\vec{v}\) at \(\vec{a}\). Let \(f_{\vec{v}}(t)=f(\vec{a}+t \vec{v}), t \in[0,1]\). If the derivative

\[f_{\vec{v}}^{\prime}\left(0^{+}\right)=\lim _{t \rightarrow 0^{+}} \frac{f_{\vec{v}}(t)-f_{\vec{v}}(0)}{t}=\lim _{t \rightarrow 0^{+}} \frac{f(\vec{a}+t \vec{v})-f(\vec{a})}{t}=A \]

exists, then we say that \(A\) is the directional derivative of \(f\) along the direction \(\vec{v}\) at \(\vec{a}\), denoted it by \(\left.\frac{\partial f}{\partial \vec{v}}\right|_{\vec{a}}\) or \(\left.\nabla_{\vec{v}} f\right|_{\vec{a}}\).

Gradient vector fields

Given a \(n\)-variable function \(f=f\left(x_{1}, \cdots, x_{n}\right)\), defined on some open domain \(D \subset E^{n}\). Assume that \(f\) is differentiable everywhere. The gradient of \(f\),

\[\nabla f=\left(\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}}, \cdots, \frac{\partial f}{\partial x_{n}}\right) \]

depends on \(\left(x_{1}, x_{2}, \cdots, x_{n}\right) \in D\). In fact, \(\nabla f\) gives a vector field on \(D\).
Gradient Vector fields and Equipotential Surfaces.
For the previous \(f=f\left(x_{1}, \cdots, x_{n}\right)\), one may consider its equipotential surface that passes through a point \(\vec{x}_{0} \in D\) :

\[S\left(\vec{x}_{0}\right)=\left\{\vec{x} \in D \mid f(\vec{x})=f\left(\vec{x}_{0}\right)\right\} . \]

Talor's Theorem

Taylor's Expansion

Let \(f=f\left(x_{1}, \cdots, x_{n}\right)\) be a \(n\)-variable function. If it has all the \(m\)-th partial derivatives near \(\vec{x}_{0}\) and they are continuous at \(\vec{x}_{0}\), then

\[\begin{gathered} f\left(\vec{x}_{0}+\Delta\right)=f\left(\vec{x}_{0}\right)+\left.(\nabla \cdot \Delta)\right|_{\vec{x}_{0}} f+\left.\frac{1}{2}(\nabla \cdot \Delta)^{2}\right|_{\vec{x}_{0}} f \\ +\cdots+\left.\frac{1}{m !}(\nabla \cdot \Delta)^{m}\right|_{\vec{x}_{0}} f+o\left(\Delta^{m}\right) \end{gathered} \]

If \(f=f\left(x_{1}, \cdots, x_{n}\right)\) has all the \(m\)-th partial derivatives near \(\vec{x}_{0}\) and they are continuous at \(\vec{x}_{0}\), then

\[f\left(\vec{x}_{0}+\Delta\right)=\sum_{|\alpha| \leq m} \frac{1}{\alpha !}\left(\left.\Delta^{\alpha} D^{\alpha}\right|_{\vec{x}_{0}} f\right)+o\left(\Delta^{m}\right) \]

Hessian Matrix

\[H(f)_{i, j}=\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} \]

The determinant of the above matrix is also sometimes referred to as the Hessian.

Maxima and minima

multi
Let \(D \subset E^{n}\), and \(f=f(\vec{x}): D \rightarrow \mathbb{R}\) a n-variable function. A point \(\vec{a} \in D\) is called a local minimum (resp. maximum) point of \(f\), if \(\exists \delta>0\) s.t.,

\[f(\vec{a}+\vec{r}) \geq f(\vec{a}) \quad(\text { resp. } \quad f(\vec{a}+\vec{r}) \leq f(\vec{a})) \]

for all

\[\|\vec{r}\|<\delta, \vec{a}+\vec{r} \in D \]

If \(\vec{a}\) is an interior point in \(D \subset E^{n}\), and is a local minimum (or maximum) point of \(f: D \rightarrow \mathbb{R}\), and \(f\) is differentiable at \(\vec{a}\), then

\[\left.\nabla f\right|_{\vec{a}}=0 \]

Let \(\vec{a}\) be an interior point of the domain of \(f=f(\vec{x})\), which is second continuously differentiable near \(\vec{a}\), and suppose that \(\left.(\nabla f)\right|_{\vec{a}}=0\). (Such a point \(\vec{a}\) is called critical.)
1 If the Hessian matrix \(H(f)\) is positive definite (equivalently, has all eigenvalues positive) at \(\vec{a}\), then \(f\) attains a local minimum at \(\vec{a}\);
2 If the Hessian matrix \(H(f)\) is negative definite (equivalently, has all eigenvalues negative) at \(\vec{a}\), then \(f\) attains a local maximum at \(\vec{a}\);
3 If the Hessian matrix \(H(f)\) has both positive and negative eigenvalues at \(\vec{a}\), then \(\vec{a}\) is NOT a local extremum of \(f\) (known as a saddle point(鞍點)).
In those cases not listed above, the test is inconclusive.

Maintaining the above assumptions, suppose that \((a, b)\) is a critical point of \(f\) (that is, \(\left.\frac{\partial f}{\partial x}(a, b)=\frac{\partial f}{\partial y}(a, b)=0\right)\).
1 If \(D(a, b)>0, f_{x x}(a, b)>0\), then \((a, b)\) is a local minimum of \(f\);
2 If \(D(a, b)>0, f_{x x}(a, b)<0\), then \((a, b)\) is a local maximum of \(f\);
3 If \(D(a, b)<0\), then \((a, b)\) is a saddle point of \(f\).

Application of Gradients

Method of least squares

The general problem:
Given data \(\left(x_{1}, y_{1}\right) \cdots\left(x_{n}, y_{n}\right)\), we wish to find \(y=m x+b\) that fits the points as fine as possible, which means we need to find \(m, b\) that minimize

\[\sum_{i=1}^{n}\left(m x_{i}+b-y_{i}\right)^{2} \]

The method of least squares solves this problem.

Consider the function

\[f(m, b)=\sum_{i=1}^{n}\left(y_{i}-m x_{i}-b\right)^{2} \]

Then

\[\nabla^{T} f=\left(\begin{array}{c} \frac{\partial f}{\partial m} \\ \frac{\partial f}{\partial b} \end{array}\right)=\left(\begin{array}{c} 2\left(m \sum_{i=1}^{n} x_{i}^{2}+b \sum_{i=1}^{n} x_{i}-\sum_{i=1}^{n} x_{i} y_{i}\right) \\ 2\left(m \sum_{i=1}^{n} x_{i}-\sum_{i=1}^{n} y_{i}+n b\right) \end{array}\right) \]

Setting \(\frac{\partial f}{\partial m}=0\) and \(\frac{\partial f}{\partial b}=0\), we obtain the equation

\[\left\{\begin{array}{c} \left(\sum_{i=1}^{n} x_{i}^{2}\right) m+\left(\sum_{i=1}^{n} x_{i}\right) b=\sum_{i=1}^{n} x_{i} y_{i} \\ \left(\sum_{i=1}^{n} x_{i}\right) m+n b=\sum_{i=1}^{n} y_{i} \end{array}\right. \]

Recall Cauchy-Schwartz inequality \((n>1)\),

\[\left(\sum_{i=1}^{n} x_{i}\right)^{2}=\left(\sum_{i=1}^{n} 1 \cdot x_{i}\right)<\sum_{i=1}^{n} 1^{2} \sum_{i=1}^{n} x_{i}^{2}=n \sum_{i=1}^{n} x_{i}^{2} \]

The determinant of the matrix of coefficients of the previous equation is not zero. Hence it has a unique solution \((m, b)\).

To tell whether the solution is a local minimum or maximum point, We calculate the second derivative and obtain its Hessian:

\[2\left(\begin{array}{cc} \sum_{i=1}^{n} x_{i}^{2} & \sum_{i=1}^{n} x_{i} \\ \sum_{i=1}^{n} x_{i} & n \end{array}\right) \]

Again, by the Cauchy-Schwartz inequality, we know it is a strictly positive definite matrix. Thus the only extreme point we've found is an absolute minimum point.

Method of Lagrange multipliers

In mathematical optimization, the method of Lagrange multipliers
(named after Joseph Louis Lagrange) is a strategy for finding the local
maxima and minima of a function subject to equality constraints.

For instance, consider the optimization problem

\[\begin{aligned} &\text { maximize } z=f(x, y) \\ &\text { subject to } g(x, y)=c \end{aligned} \]

We need both \(f\) and \(g\) to have continuous first partial derivatives.

We introduce a new variable \(\lambda\) called a Lagrange multiplier and study the Lagrange function (or Lagrangian) defined by

\[F(x, y, \lambda)=f(x, y)+\lambda \cdot(g(x, y)-c) \]

Consider critical points of \(F\), i.e., those points where the partial derivatives of \(F\) are zero:

\[\left\{\begin{array}{l} \nabla f+\lambda \nabla g=0 \\ g(x, y)-c=0 \end{array}\right. \]

We thus see the following fact:
If \(f\left(x_{0}, y_{0}\right)\) is a maximum of \(f(x, y)\) for the original constrained problem, then there exists \(\lambda_{0}\) such that \(\left(x_{0}, y_{0}, \lambda_{0}\right)\) is a critical point for the Lagrange function. In other words, \(\left(x_{0}, y_{0}, \lambda_{0}\right)\) can be solved from the equation:

\[\left\{\begin{array}{l} \frac{\partial f}{\partial x}+\lambda \frac{\partial g}{\partial x}=0 \\ \frac{\partial f}{\partial y}+\lambda \frac{\partial g}{\partial y}=0 \\ g(x, y)-c=0 \end{array}\right. \]