Differentiations of Multivariate Functions
Limit Points,Open and Closed Set
Neiborhoods and Open Sets
Deleted neighborhood
For \(\delta>0\) and \(a \in \mathbb{R}\), the set \(\{x \in \mathbb{R}|0<| x-a \mid<\delta\}\) is called a deleted \(\delta\) - neighborhood of the point \(a .\)
Interior Point
Let \(x \in S\). If \(\exists\) an open neighborhood \(U\)
Open Set
A set \(S\subset \mathbb{R}\) is called open if\(\forall x \in S\) is an interior point of \(S\).
Limit Point and Closed Sets
Point of accumulation
Given a set \(S \subset \mathbb{R}\), a point \(I \in \mathbb{R}\)
Closed Set
A Set \(S\) is called closed if it contains all its limit points.
Jacobian Matrix and Directional Derivatives
Gradient of composed function
Notation: For a \(n\)-variable function \(f=f\left(x_{1}, x_{2}, \ldots, x_{n}\right)\), we have defined its gradient:
\[\operatorname{grad} f=\nabla f=\frac{\partial f}{\partial\left(x_{1}, \cdots, x_{n}\right)}=\left(\frac{\partial f}{\partial x_{1}}, \cdots, \frac{\partial f}{\partial x_{n}}\right) \]Gradients are higher dimensional derivatives.
Recall that, in the one-variable case, if \(f=f(x)\), and \(x=\varphi(t)\), then the composed function is
We have the standard chain rule:
\[\frac{d f}{d t}=\frac{d f}{d x} \frac{d \varphi}{d t} \]Question: For \(f=f\left(x_{1}, \cdots, x_{n}\right)\) and \(x_{1}=\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots\) \(x_{n}=\varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\), consider the composed function
\[f=f\left(x_{1}, \cdots, x_{n}\right)=f\left(\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots, \varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\right) \]Then what is \(\nabla f\) in terms of \(y_{1}, \cdots, y_{m} ?\) l.e.,
\[\frac{\partial f}{\partial\left(y_{1}, \cdots, y_{m}\right)}=\left(\frac{\partial f}{\partial y_{1}}, \cdots, \frac{\partial f}{\partial y_{m}}\right)=? \]For short, we have
\[\frac{\partial f}{\partial\left(y_{1} \cdots y_{m}\right)}=\frac{\partial f}{\partial\left(x_{1} \cdots x_{n}\right)} \frac{\partial\left(\varphi_{1} \cdots \varphi_{n}\right)}{\partial\left(y_{1} \cdots y_{m}\right)} \]where
\[\frac{\partial\left(\varphi_{1} \cdots \varphi_{n}\right)}{\partial\left(y_{1} \cdots y_{m}\right)}=\left(\begin{array}{ccc} \frac{\partial \varphi_{1}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{1}}{\partial y_{m}} \\ \frac{\partial \varphi_{2}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{2}}{\partial y_{m}} \\ \vdots & & \vdots \\ \frac{\partial \varphi_{n}}{\partial y_{1}} & \cdots & \frac{\partial \varphi_{n}}{\partial y_{m}} \end{array}\right) \]is called the Jacobian matrix.
First order differential forms
Invariance of differential forms: For \(f=f\left(x_{1}, \cdots, x_{n}\right)\), we have
\[d f=\frac{\partial f}{\partial x_{1}} d x_{1}+\cdots+\frac{\partial f}{\partial x_{n}} d x_{n} \]Now, if we compose \(f\) with \(x_{1}=\varphi_{1}\left(y_{1}, \cdots, y_{m}\right), \cdots\), \(x_{n}=\varphi_{n}\left(y_{1}, \cdots, y_{m}\right)\), and substitute
\[d x_{i}=\frac{\partial \varphi_{i}}{\partial y_{1}} d y_{1}+\cdots+\frac{\partial \varphi_{i}}{\partial y_{m}} d y_{m} \]we get
\[d f=\cdots=\frac{\partial f}{\partial y_{1}} d y_{1}+\cdots+\frac{\partial f}{\partial y_{m}} d y_{m} \]Directional derivatives
A function \(f=f\left(x_{1}, \cdots, x_{n}\right)\) is defined on \(D \subset E^{n}\). For a point \(\vec{a}=\left(a_{1}, \cdots, a_{n}\right) \in D^{\circ}\), take any direction \(\vec{v} \neq 0\), and consider the rate of change of \(f\) along \(\vec{v}\) at \(\vec{a}\). Let \(f_{\vec{v}}(t)=f(\vec{a}+t \vec{v}), t \in[0,1]\). If the derivative
\[f_{\vec{v}}^{\prime}\left(0^{+}\right)=\lim _{t \rightarrow 0^{+}} \frac{f_{\vec{v}}(t)-f_{\vec{v}}(0)}{t}=\lim _{t \rightarrow 0^{+}} \frac{f(\vec{a}+t \vec{v})-f(\vec{a})}{t}=A \]exists, then we say that \(A\) is the directional derivative of \(f\) along the direction \(\vec{v}\) at \(\vec{a}\), denoted it by \(\left.\frac{\partial f}{\partial \vec{v}}\right|_{\vec{a}}\) or \(\left.\nabla_{\vec{v}} f\right|_{\vec{a}}\).
Gradient vector fields
Given a \(n\)-variable function \(f=f\left(x_{1}, \cdots, x_{n}\right)\), defined on some open domain \(D \subset E^{n}\). Assume that \(f\) is differentiable everywhere. The gradient of \(f\),
\[\nabla f=\left(\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}}, \cdots, \frac{\partial f}{\partial x_{n}}\right) \]depends on \(\left(x_{1}, x_{2}, \cdots, x_{n}\right) \in D\). In fact, \(\nabla f\) gives a vector field on \(D\).
Gradient Vector fields and Equipotential Surfaces.
For the previous \(f=f\left(x_{1}, \cdots, x_{n}\right)\), one may consider its equipotential surface that passes through a point \(\vec{x}_{0} \in D\) :
Talor's Theorem
Taylor's Expansion
Let \(f=f\left(x_{1}, \cdots, x_{n}\right)\) be a \(n\)-variable function. If it has all the \(m\)-th partial derivatives near \(\vec{x}_{0}\) and they are continuous at \(\vec{x}_{0}\), then
\[\begin{gathered} f\left(\vec{x}_{0}+\Delta\right)=f\left(\vec{x}_{0}\right)+\left.(\nabla \cdot \Delta)\right|_{\vec{x}_{0}} f+\left.\frac{1}{2}(\nabla \cdot \Delta)^{2}\right|_{\vec{x}_{0}} f \\ +\cdots+\left.\frac{1}{m !}(\nabla \cdot \Delta)^{m}\right|_{\vec{x}_{0}} f+o\left(\Delta^{m}\right) \end{gathered} \]If \(f=f\left(x_{1}, \cdots, x_{n}\right)\) has all the \(m\)-th partial derivatives near \(\vec{x}_{0}\) and they are continuous at \(\vec{x}_{0}\), then
\[f\left(\vec{x}_{0}+\Delta\right)=\sum_{|\alpha| \leq m} \frac{1}{\alpha !}\left(\left.\Delta^{\alpha} D^{\alpha}\right|_{\vec{x}_{0}} f\right)+o\left(\Delta^{m}\right) \]Hessian Matrix
\[H(f)_{i, j}=\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} \]The determinant of the above matrix is also sometimes referred to as the Hessian.
Maxima and minima
multi
Let \(D \subset E^{n}\), and \(f=f(\vec{x}): D \rightarrow \mathbb{R}\) a n-variable function. A point \(\vec{a} \in D\) is called a local minimum (resp. maximum) point of \(f\), if \(\exists \delta>0\) s.t.,
for all
\[\|\vec{r}\|<\delta, \vec{a}+\vec{r} \in D \]If \(\vec{a}\) is an interior point in \(D \subset E^{n}\), and is a local minimum (or maximum) point of \(f: D \rightarrow \mathbb{R}\), and \(f\) is differentiable at \(\vec{a}\), then
\[\left.\nabla f\right|_{\vec{a}}=0 \]Let \(\vec{a}\) be an interior point of the domain of \(f=f(\vec{x})\), which is second continuously differentiable near \(\vec{a}\), and suppose that \(\left.(\nabla f)\right|_{\vec{a}}=0\). (Such a point \(\vec{a}\) is called critical.)
1 If the Hessian matrix \(H(f)\) is positive definite (equivalently, has all eigenvalues positive) at \(\vec{a}\), then \(f\) attains a local minimum at \(\vec{a}\);
2 If the Hessian matrix \(H(f)\) is negative definite (equivalently, has all eigenvalues negative) at \(\vec{a}\), then \(f\) attains a local maximum at \(\vec{a}\);
3 If the Hessian matrix \(H(f)\) has both positive and negative eigenvalues at \(\vec{a}\), then \(\vec{a}\) is NOT a local extremum of \(f\) (known as a saddle point(鞍點)).
In those cases not listed above, the test is inconclusive.
Maintaining the above assumptions, suppose that \((a, b)\) is a critical point of \(f\) (that is, \(\left.\frac{\partial f}{\partial x}(a, b)=\frac{\partial f}{\partial y}(a, b)=0\right)\).
1 If \(D(a, b)>0, f_{x x}(a, b)>0\), then \((a, b)\) is a local minimum of \(f\);
2 If \(D(a, b)>0, f_{x x}(a, b)<0\), then \((a, b)\) is a local maximum of \(f\);
3 If \(D(a, b)<0\), then \((a, b)\) is a saddle point of \(f\).
Application of Gradients
Method of least squares
The general problem:
Given data \(\left(x_{1}, y_{1}\right) \cdots\left(x_{n}, y_{n}\right)\), we wish to find \(y=m x+b\) that fits the points as fine as possible, which means we need to find \(m, b\) that minimize
The method of least squares solves this problem.
Consider the function
\[f(m, b)=\sum_{i=1}^{n}\left(y_{i}-m x_{i}-b\right)^{2} \]Then
\[\nabla^{T} f=\left(\begin{array}{c} \frac{\partial f}{\partial m} \\ \frac{\partial f}{\partial b} \end{array}\right)=\left(\begin{array}{c} 2\left(m \sum_{i=1}^{n} x_{i}^{2}+b \sum_{i=1}^{n} x_{i}-\sum_{i=1}^{n} x_{i} y_{i}\right) \\ 2\left(m \sum_{i=1}^{n} x_{i}-\sum_{i=1}^{n} y_{i}+n b\right) \end{array}\right) \]Setting \(\frac{\partial f}{\partial m}=0\) and \(\frac{\partial f}{\partial b}=0\), we obtain the equation
\[\left\{\begin{array}{c} \left(\sum_{i=1}^{n} x_{i}^{2}\right) m+\left(\sum_{i=1}^{n} x_{i}\right) b=\sum_{i=1}^{n} x_{i} y_{i} \\ \left(\sum_{i=1}^{n} x_{i}\right) m+n b=\sum_{i=1}^{n} y_{i} \end{array}\right. \]Recall Cauchy-Schwartz inequality \((n>1)\),
\[\left(\sum_{i=1}^{n} x_{i}\right)^{2}=\left(\sum_{i=1}^{n} 1 \cdot x_{i}\right)<\sum_{i=1}^{n} 1^{2} \sum_{i=1}^{n} x_{i}^{2}=n \sum_{i=1}^{n} x_{i}^{2} \]The determinant of the matrix of coefficients of the previous equation is not zero. Hence it has a unique solution \((m, b)\).
To tell whether the solution is a local minimum or maximum point, We calculate the second derivative and obtain its Hessian:
\[2\left(\begin{array}{cc} \sum_{i=1}^{n} x_{i}^{2} & \sum_{i=1}^{n} x_{i} \\ \sum_{i=1}^{n} x_{i} & n \end{array}\right) \]Again, by the Cauchy-Schwartz inequality, we know it is a strictly positive definite matrix. Thus the only extreme point we've found is an absolute minimum point.
Method of Lagrange multipliers
In mathematical optimization, the method of Lagrange multipliers
(named after Joseph Louis Lagrange) is a strategy for finding the local
maxima and minima of a function subject to equality constraints.
For instance, consider the optimization problem
\[\begin{aligned} &\text { maximize } z=f(x, y) \\ &\text { subject to } g(x, y)=c \end{aligned} \]We need both \(f\) and \(g\) to have continuous first partial derivatives.
We introduce a new variable \(\lambda\) called a Lagrange multiplier and study the Lagrange function (or Lagrangian) defined by
\[F(x, y, \lambda)=f(x, y)+\lambda \cdot(g(x, y)-c) \]Consider critical points of \(F\), i.e., those points where the partial derivatives of \(F\) are zero:
\[\left\{\begin{array}{l} \nabla f+\lambda \nabla g=0 \\ g(x, y)-c=0 \end{array}\right. \]We thus see the following fact:
If \(f\left(x_{0}, y_{0}\right)\) is a maximum of \(f(x, y)\) for the original constrained problem, then there exists \(\lambda_{0}\) such that \(\left(x_{0}, y_{0}, \lambda_{0}\right)\) is a critical point for the Lagrange function. In other words, \(\left(x_{0}, y_{0}, \lambda_{0}\right)\) can be solved from the equation: