Skip to main content
\(\newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\lt}{<} \newcommand{\gt}{>} \newcommand{\amp}{&} \)

Section 8.7 The Chain Rule

Subsection 8.7.1 Review: Rate of Change and Composition

We start by reminding ourselves that a rate of change is a ratio of changes for two variables. If \(y\) is a function of \(x\text{,}\) say \(y=f(x)\text{,}\) then the rate of change \(\left.\frac{dy}{dx}\right|_a=f'(a)\) is the rate of change of \(y\) with respect to \(x\) at the value \(x=a\text{.}\) This measures the instantaneous ratio of changes in \(y\) from \(f(a)\) to changes in \(x\) from \(a\text{.}\) At any value \(x\) close to \(a\text{,}\) this means that

\begin{equation*} y-f(a) \approx \left.\frac{dy}{dx}\right|_{a} \cdot (x-a). \end{equation*}

Changes in the value of \(y\) are approximately proportional to changes in \(x\) from \(a\) and the derivative \(f'(a)\) is the proportionality constant.

Second, we remind ourselves that compositions correspond to chains of dependent variables. Suppose that \(u\) is a function of \(x\text{,}\) say \(u=g(x)\text{,}\) and \(y\) is subsequently a function of \(u\text{,}\) say \(y=f(u)\text{.}\) We would write this chain as

\begin{equation*} \left\{ \begin{matrix} u=g(x) \\ y=f(u) \end{matrix} \right.. \end{equation*}

Using substitution, we could also just write that \(y\) is a function of \(x\) using composition.

\begin{equation*} y=f(g(x)) = f \circ g(x). \end{equation*}

Now, let us consider a particular value for \(x\) and ask how would we determine the rate of change of \(y\) with respect to \(x\) when it is defined with such a composition? A change in \(x\) from \(a\text{,}\) \(\Delta x = x-a\text{,}\) would lead to a change in \(u\) from \(g(a)\) using the rate of change

\begin{equation*} \Delta u = u-g(a) \approx \left.\frac{du}{dx}\right|_{a} \cdot (x-a) = g'(a) \cdot \Delta x. \end{equation*}

In a similar way, a change in \(u\) from its starting value \(g(a)\) would lead to a change in \(y\) from \(f(g(a))\) using the rate of change

\begin{equation*} \Delta y = y - f(g(a)) \approx \left. \frac{dy}{du}\right|_{g(a)} \cdot (u-g(a)) = f'(g(a))\cdot \Delta u. \end{equation*}

Putting these two results of the chain together, we find that

\begin{equation*} \Delta y \approx \left. \frac{dy}{du} \right|_{g(a)} \cdot \left. \frac{du}{dx} \right|_{a} \cdot \Delta x = f'(g(a)) \cdot g'(a) \cdot \Delta x. \end{equation*}

Graphically, this is illustrated in the figure below. The inputs and outputs of the functions for \(g\) and \(f\) are illustrated as number lines. The input \(a\) to the function \(g\) is mapped to the output \(g(a)\text{.}\) A nearby input \(x\) is mapped to an output \(g(x)\) that is not too far from \(g(a)\text{.}\) The differences are the values \(\Delta x = x-a\) and \(\Delta u = g(x)-g(a)\text{.}\) In composition, the outputs \(g(a)\) and \(g(x)\) act as inputs to \(f\text{.}\)

The derivative provides an approximate ratio in the changes of output values to the changes of input values. The smaller the input, the closer the approximation. (This is why the derivative must be defined as a limit of the average rate of change.) When functions are in composition, each function effectively amplifies the difference in output by the factor of the derivative. So the overall change in the output is a result of the product of the derivatives.

Subsection 8.7.2 The Chain Rule for Derivatives

The chain rule formalizes the ideas in the previous paragraphs. It states that the derivative of a composition \(f(g(x))\) has a derivative given by

\begin{equation*} \frac{d}{dx} [ f(g(x)) ] = f'(g(x)) \cdot g'(x). \end{equation*}

Pay close attention to the inputs of \(f'\) and \(g'\text{.}\) Compare those values to what we had to do in the previous paragraphs. The inputs are different because the functions \(f\) and \(g\) have different inputs in the composition.

This is often abbreviated as

\begin{equation*} \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}. \end{equation*}

Notice that this form almost looks like an algebraic simplification where the symbol \(du\) on the right would cancel to give the formula on the left.

Example 8.7.2

Find the derivative of \(f(x)=(2x+1)^2\) using the chain rule and compare the result to what you get if you expand \(f(x)\) before differentiation.


To use the chain rule, we must identify the chain/composition that is involved. (This is why we need to learn to recognize compositions.) The chain can be found by recognizing that we need to square \(2x+1\text{,}\) so this is our first step, \(u=2x+1\text{.}\) Then \(y=f(x)\) can be rewritten \(y=u^2\text{.}\) We can also find the derivatives of each step in the chain:

\begin{equation*} \left\{ \begin{matrix} u=2x+1 \\ y=u^2 \end{matrix} \right. \quad \Rightarrow \quad \left\{ \begin{matrix} \frac{du}{dx} = 2 \\ \frac{dy}{du} = 2u \end{matrix} \right.. \end{equation*}

Consequently, we have

\begin{equation*} \frac{dy}{dx} = \left. \frac{dy}{du} \right|_{u=2x+1} \cdot \frac{du}{dx}. \end{equation*}

The notation \(u=2x+1\) is simply a reminder that when writing the derivative \(\frac{dy}{du}=2u\) we will also need to replace \(u=2x+1\text{.}\)

\begin{equation*} f'(x) = \frac{dy}{dx} = (2u) \cdot (2) = 2(2x+1) \cdot 2 = 4(2x+1) \end{equation*}

The other approach is to expand \(f(x)\) to a form that is easier to differentiate.

\begin{equation*} f(x) = (2x+1)^2 = (2x+1)(2x+1) = 4x^2 + 4x+ 1 \end{equation*}

This is a simple polynomial form that has a simple derivative:

\begin{equation*} f'(x) = 8x+4. \end{equation*}

We can see that this is actually the same as our earlier derivative if we factor out the common factor of 4.

Example 8.7.3

Find the derivative of \(f(x) = 3(x^2+3x)^7\text{.}\)


Our function has an intermediate formula \(u=x^2+3x\) that is then raised to the 7th power and multiplied by 3. That is, if \(y=f(x)\) then \(y=3u^7\text{.}\) We would write this as a chain, along with their derivatives:

\begin{equation*} \left\{ \begin{matrix} u=x^2+3x \\ y=3u^7 \end{matrix} \right. \quad \Rightarrow \quad \left\{ \begin{matrix} \frac{du}{dx} = 2x+3 \\ \frac{dy}{du} = 21u^6 \end{matrix} \right.. \end{equation*}

The chain rule implies

\begin{align*} f'(x) = \frac{dy}{dx} &= \frac{dy}{du} \frac{du}{dx} \\ &= 21 u^6 \cdot (2x+3) \\ &= 21(x^2+3x)^6 (2x+3) \end{align*}

In the language of functions often given in texts, we could instead do this by writing \(f(x)\) as a composition \(f(x) = g(h(x))\text{:}\)

\begin{align*} u = x^2+3x \qquad & h(x) = x^2+3x \qquad h'(x) = 2x+3 \\ y = 3u^7 \qquad & g(u) = 3u^7 \qquad g'(u) = 21u^6 \end{align*}

The chain rule would be written:

\begin{align*} f'(x) &= g'(h(x)) \cdot h'(x) \\ &= g'(x^2+3x) \cdot (2x+3) \\ &= 21(x^2+3x)^6 (2x+3) \end{align*}