## Section1.4Graphs and Relations between Variables

### Subsection1.4.1Overview

In physical settings, we usually consider measurements of multiple quantities at the same time. We do this because we are interested in the relationships between these different quantities. We think of each quantity of interest as a state variable; the collection of all such variables under consideration is called the system. At any instant, the variables of the system will each have a particular value and the collection of those values at that instant is called the state of the system. Graphs are often used to reveal relationships between different variables.

In this section, we explore graphs of equations and graphs of data through scatter plots. An equation involving multiple variables represents a relation between those variables. We consider the role of solving equations in the context of these relations.

### Subsection1.4.2Systems, States and Variables

In the course of an experiment, or even just in observation, many different quantities typically are covarying, or changing with one another. For example, an object in motion has changing position, changing velocity, and changing forces. In the course of a chemical reaction, there are changing concentrations of the different reactants and products as well as possibly changing temperature, pH, and volume, for example. While observing a changing population, there could be changing population numbers, total biomass, birth and death rates, consumption of resources, and production of products and waste.

Mathematically, the system consists of all possible observable quantities associated with the experiment or observed physical system. The state of the system refers to the collection of instantaneous values of all such quantities at a particular instant or configuration of the system. A state variable, or more simply a variable, represents a single quantity that is or could be observed in the system.

###### Example1.4.1

Consider the following data about the population, births and deaths in the United States. To conserve space, the data are given using scientific notation expressed in the standard machine form where the power of 10 follows the letter E, so that $2.521 \times 10^8$ would be written 2.521E8.

 Year Population Births Deaths Year Population Births Deaths 1991 2.521E8 4.111E6 2.170E6 2001 2.850E8 4.026E6 2.416E6 1992 2.550E8 4.065E6 2.176E6 2002 2.876E8 4.022E6 2.443E6 1993 2.577E8 4.000E6 2.269E6 2003 2.901E8 4.090E6 2.448E6 1994 2.602E8 3.953E6 2.279E6 2004 2.928E8 4.112E6 2.397E6 1995 2.628E8 3.900E6 2.312E6 2005 2.955E8 4.138E6 2.448E6 1996 2.652E8 3.891E6 2.315E6 2006 2.984E8 4.266E6 2.426E6 1997 2.677E8 3.881E6 2.314E6 2007 3.012E8 4.316E6 2.424E6 1998 2.703E8 3.942E6 2.337E6 2008 3.041E8 4.248E6 2.472E6 1999 2.727E8 3.959E6 2.391E6 2009 3.068E8 4.131E6 2.437E6 2000 2.822E8 4.059E6 2.403E6 2010 3.094E8 3.999E6 2.468E6

Each row (corresponding to the population in a given year) represents a distinct state of the system. The observed values in the state are the variables: the year, the total population at the beginning of the year, the total number of births in the year, and the total number of deaths in the year. The year is included as one of the variables—an independent variable—in order to distinguish the different states with respect to time.

We often represent a variable by a symbol—a letter, a Greek letter, an abbreviation, or even a word. That symbol becomes a name for the variable to be used in sentences, expressions, and equations. Uppercase and lowercase letters are different symbols and should not be interchanged with one another. The choice of symbol should generally be related to the meaning of the variable. An important part of communication in modeling is in stating clearly the variables of a system and identifying the symbols that are chosen to represent them.

###### Example1.4.2

In the previous example, there were four variables. A common strategy is to use the first letter of a word describing each variable. The population variable might be represented by the symbol $P\text{.}$ Births and deaths might be represented by the symbols $B$ and $D\text{,}$ respectively. The year might be represented by the symbol $Y\text{.}$

Note that the symbols $p\text{,}$ $b\text{,}$ $d$ and $y$ are not the same as the symbols above. They should not be used for this problem.

The next example illustrates how we might write a short explanation of a system and the variables associated with it. Note how the physical explanation of the system is described first, followed by an introduction of the measurements taken and the symbols used to represent those variables. Any time you have data and refer to the data by variables, you need a few sentences that introduce the meaning of each variable along with the units of measurement.

###### Example1.4.3

In biology, scientists run electrophoresis gels to determine the size of polymers, such as proteins or DNA strands. The gel provides a porous structure for the polymers to travel through while an electric potential (voltage) creates a force that pulls the polymers through the gel. Different size polymers travel at different speeds. The experiment is setup with all polymers starting at one end of the gel, the voltage is turned on for a certain amount of time and then disconnected. Clusters of similarly sized polymers are identified visually as bands on the gel, with smaller polymers traveling a greater distance.

The image below represents an electrophoresis gel run on a standardized collection of DNA of fixed sizes. Because the image does not show a length scale, the distances traveled by the different lengths are measured in image pixels and recorded in the table below. The variables for the experiment are the length of DNA segments and the distance traveled through the gel. Let $L$ represent the length of the segment (in nucleotides) and let $D$ represent the distance traveled (in pixels), measured from the center of the starting well to the center of the corresponding band in the image. Each row represents a single state $(L,D)$ of the system.

 $L$ (nts) $D$ (px) 100 342 200 327 300 312 400 299 500 288 600 278 700 270 800 263 900 256 1000 249

### Subsection1.4.3Scatter Plots and Relationships Between Variables

The primary motivation for collecting data regarding different variables in the state of a system is to determine relationships between those variables. One of the ways that we look for relationships is using a scatter plot. A scatter plot is a graph showing the relationship between two variables. Suppose the two variables use symbols $x$ and $y\text{.}$ For each state of the system, there will have been observed values for both $x$ and $y\text{.}$ The graph will include points for each pair $(x,y)\text{.}$

Spreadsheets (like Microsoft Excel, Apple Numbers or Google Sheets) are a common tool to generate scatter plots. The data are first put in a table. The first column of data will correspond to the variable used for the horizontal axis ($x$), and the second column of data will correspond to the variable for the vertical axis ($y$). Select the two columns at the same time and add a chart to your spreadsheet, choosing the scatter plot style of graph. You should become familiar with how to create a scatter plot. Always be sure that you label your axes, using the variables of the system rather than the generic names of $x$ and $y\text{.}$

The following figure shows two different scatter plots for the electrophoresis gel data above. One plot is based on the pairs $(L,D)$ whereas the other is based on the pairs $(D,L)\text{.}$ These graphs contains the same information but viewed from a reverse perspective. When we switch the order of the variables, we call the relationships inverse relations.

When a system has a state defined by more than two variables, scatter plots can be defined for each pair of state variables. For example, the population data has four state variables, $(Y,P,B,D)\text{.}$ Three scatter plots can be formed by plotting the population, the total births and the total deaths versus the year, giving graphs of points $(Y,P)\text{,}$ $(Y,B)\text{,}$ $(Y,D)\text{.}$ Because the births and deaths are on the same scale, we can combine the plots as one. We could also plot the inverse relations $(P,Y)\text{,}$ $(B,Y)$ and $(D,Y)\text{,}$ but these really contain the same information from a different view.

We can also look at relationships between other pairs of variables. For example, we can look at how the number of births or deaths relate to the population, plotting $(P,B)$ and $(P,D)\text{,}$ or how the number of births relate to the number of deaths with $(B,D)\text{.}$ The graph showing the relation between births and deaths to time (above) is very similar to the graph showing the relation between births and deaths to population (below). However, the relation between the births and deaths illustrates that sometimes variables do not show a clear relation.

### Subsection1.4.4Graphs of Equations

An equation gives an abstract representation of a relationship between variables. Just as the state of an experimental system is defined by the value of the variables defining the state, an equation can be considered as a mathematical way to define relationships between variables of an abstract system. A scatter plot of data is generalized for an equation as a graph of all solutions. If we choose an ordering for the variables (e.g., alphabetical), the values for the variables can be conveniently listed as an ordered list. When two variables are involved in an equation, the ordered list is called an ordered pair, or point, like $(x,y)\text{,}$ and the graph of the equation is typically a curve in the plane.

###### Example1.4.5

The equation

\begin{equation*} 2x+3y = 12 \end{equation*}

involves two variables, $x$ and $y\text{,}$ and is the equation of a line. The expressions in the equation are $2x+3y$ and $12\text{.}$ The values $x=3$ and $y=2\text{,}$ corresponding to the ordered pair $(x,y)=(3,2)\text{,}$ provide one solution because for those values,

\begin{equation*} 2x+3y = 2(3)+3(2)=12, \end{equation*}

so that the equation is true. On the other hand, $(x,y)=(4,1)$ is not a solution because for that state,

\begin{equation*} 2x+3y = 2(4)+3(1)=11 \end{equation*}

and $11 \ne 12\text{.}$ Some other solutions include the points $(6,0)$ and $(0,4)\text{.}$ The line corresponding to this equation represents the set of all such solutions. The points $(3,2)\text{,}$ $(6,0)$ and $(0,4)$ are on the line, while $(4,1)$ is not.

###### Example1.4.6

The equation

\begin{equation*} u^2+v^2=16+6u \end{equation*}

also involves two variables, $u$ and $v\text{.}$ The expressions in the equation are $u^2+v^2$ and $16+6u\text{.}$ Using ordered pairs $(u,v)\text{,}$ the points $(3,5)$ and $(3,-5)$ are solutions. That is, if $(u,v)=(3,5)\text{,}$ the expressions have the same value:

\begin{align*} u^2+v^2 &= 3^2+5^2=9+25 = 34,\\ 16+6u &= 16+6(3) = 16+18 = 34. \end{align*}

It is possible to show that the graph of solutions for this equation is a circle centered at $(3,0)$ with radius 5. Other points on this circle include such points as $(-2,0)$ and $(6,-4)\text{.}$ You should verify that these are also solutions, at least for one or two points to reinforce the idea that a solution makes the statement of the equation true.

It is usually difficult to know how to sketch the graph of an arbitrary equation. Computer utilities that support implicit plots can be used. For example, the online graphing calculator at desmos.com allows you to enter an equation involving variables $x$ and $y\text{.}$ We also could use computational systems, such as SageMath shown below, to create an implicit plot.

When an equation is written as a dependent variable being equal to an expression involving an independent variable, we can easily generate points that are in the solution set using a table with the independent variable in the first column and the dependent variable in the second column. We choose convenient values for the independent variable, compute the value of the expression that depends on that variable, and then use that resulting value for the dependent variable. All such points will be solutions to the equation. This is precisely how a graphing calculator works internally; it computes many such points very quickly and connects the points with line segments.

###### Example1.4.7

Rewrite the equation $2x+3y=12$ so that $y$ is the dependent variable. Use the new equation to find four points in the solution set.

Solution

We need to isolate the variable $y$ using balanced operations.

\begin{gather*} 2x+3y = 12\\ 3y = -2x+12 \\ y = \frac{1}{3}(-2x+12)\\ y = -\frac{2}{3} x + 4 \end{gather*}

The final equation $y = -\frac{2}{3}x+4$ should be recognized as a slope-intercept equation A.2.6 of a line. The slope is $m=-\frac{2}{3}$ while the $y$-intercept value is $b=4\text{.}$ Having solved for $y\text{,}$ we can now use four different values for $x$ to find corresponding values for $y\text{.}$ We do this in a table.

 $x$ $y=-\frac{2}{3}x+4$ $(x,y)$ 0 $-\frac{2}{3}(0)+4 = 4$ $(0,4)$ 1 $-\frac{2}{3}(1)+4 = \frac{10}{3}$ $(1,\frac{10}{3})$ 2 $-\frac{2}{3}(2)+4 = \frac{8}{3}$ $(2,\frac{8}{3})$ 3 $-\frac{2}{3}(3)+4 = 2$ $(3,2)$

Solving equations has a graphical interpretation. Recall that an equation is a statement that two expressions are equal. We can imagine that each expression defines a dependent variable, say $y_1$ and $y_2\text{.}$ A solution would be where the dependent variables have the same value, $y_1=y_2\text{.}$ Graphically, that would be where the curves defined by the dependent variables cross.

###### Example1.4.8

The equation $2x^2=3x+5$ can be interpreted as having two dependent variables,

\begin{align*} y_1 &= 2x^2,\\ y_2 &= 3x+5. \end{align*}

We can plot a graph of each equation in the same figure. The first dependent variable $y_1$ corresponds to a parabola and the second dependent variable $y_2$ corresponds to a line. Points where the graphs cross correspond to solutions.

The original equation only involved the independent variable $x\text{.}$ Consequently, the solution set corresponds to the set of the $x$-coordinates of the points of intersection. That is, the solution set is $\{-1, \frac{5}{2}\}\text{.}$

When we find an equivalent equation using balanced operations, the new equation involves different expressions. Consequently, the graphs that would be involved will be different. However, the locations where the graphs intersect will be at the same $x$-values.

###### Example1.4.9

Compare the graphs for the equations $2x^2=3x+5\text{,}$ $2x^2-3x=5\text{,}$ $2x^2-5=3x\text{,}$ and $2x^2-3x-5=0\text{,}$ shown below. Notice how the $x$-values for the points of intersection are always the same.

### Subsection1.4.5Trend Lines and Regression Curves

Mathematical equations are exact. Real data exhibit uncertainty and randomness. Although they do not capture the uncertainty of data, equations can be used to model or approximate the trend or average presented by the data. A trend line or trend curve (if not linear) is a model that captures the general behavior of the data and stays close to the scatter points. Most spreadsheets and graphing calculators have an option to show a line of best fit for a scatter plot, which is an example of a trend line. The process these programs use is called regression. They usually report the equation of the regression curve using the generic variable symbols $x$ and $y\text{,}$ so it is the researcher's responsibility to interpret the equation in terms of the true variables.

###### Example1.4.14

Consider the population example with the scatter plot of the number of deaths plotted with respect to the total population size, and predict the number of deaths in a year if the population were 300 million.

A spreadsheet reported the trend line of this data set with an equation

\begin{equation*} y=0.0049x+992711. \end{equation*}

Because the scatter plot had $P$ on the horizontal axis and $D$ on the vertical axis, the more appropriate equation would be

\begin{equation*} D=0.0049P+992711. \end{equation*}

A spreadsheet may not give enough precision in the model equation when using the default settings. Note that the equation reported above only has two significant digits in the slope value but an apparent 6 significant digits in the intercept. By changing the settings for the equation of the trend line, we get a more precise model

\begin{equation*} D = 4.86666 \times 10^{-3} P + 9.92711 \times 10^5. \end{equation*}

Depending on the values of the data, the greater accuracy might make a significant difference.

Let us compare the two models with a population of 300 million, $P=300\times 10^6\text{.}$ The first model, which only has two significant digits in the first coefficient, gives

\begin{equation*} D = 0.0049(3\times 10^8) + 992711 = 2462711. \end{equation*}

Because one coefficient only had two significant digits, we can only expect the first two digits are accurate, thus predicting $D=2.5$ million deaths in the year. Using the second model with six digits of accuracy in both coefficients, we find

\begin{equation*} D = 4.86666 \times 10^{-3} \cdot 3 \times 10^{8} + 992711 = 2452709. \end{equation*}

With six digits of accuracy, this predicts $D=2,452,710$ deaths in the year.

A trend line or a trend curve provides a model that allows us to predict values where there are not observed data. When the prediction occurs between observed data, such prediction is called interpolation. If the prediction is occurring beyond the extremes of the data, such prediction is called extrapolation. We can use the value for one variable and the regression equation to solve for the predicted value of the related variable. Often, a formula may not describe all of the data but provides a good approximation for a certain range of values. Interpolation is usually safer than extrapolation.

###### Example1.4.15

Consider the electrophoresis gel scatter plot with the length of the DNA $L$ graphed with respect to the distance traveled in the gel $D\text{.}$ The data appear to follow a nice curve without a lot of uncertainty. Using a polynomial trend, a spreadsheet reports the following equation for the data:

\begin{equation*} y = 0.0573x^2-43.381x+8241.6. \end{equation*}

Using the appropriate variables for the problem, this equation should be rewritten as

\begin{equation*} L = 0.0573D^2-43.381D+8241.6. \end{equation*}

The graph of the data with the trend curve is shown below.

Suppose we had another DNA sample of unknown length that traveled a distance of $D=282$ pixels. Using our value for $D\text{,}$ we can find the value of $L$ using the model,

\begin{equation*} L = 0.0573(282)^2 - 43.381(282)+8241.6 = 564.8832. \end{equation*}

Since our original data had 3 significant digits, we would estimate the length of the DNA in question as $L \approx 565$ nucleotides. In this way, a regression of known electrophoresis data allows us to estimate lengths of other molecules.

You should note that the number of significant digits reported is not the same as the uncertainty in the prediction. The degree to which the original data vary around the trend curve leads to uncertainty in the coefficients of the regression model and subsequent uncertainty to the trend curve itself. However, analysis of this uncertainty is a topic for statistics and is outside the scope of this text.

### Subsection1.4.6Summary

• Quantities that can be measured correspond to state variables. A system is the collection of all possible variables. The state of the system is the collection of values measured for all of the variables simultaneously. An important part of communication is describing all relevant variables and introducing their names.

• A relation between two variables can often be visualized graphically using a scatterplot. An equation is the mathematical idealization of a relation. The graph of an equation involving two variables, say $x$ and $y\text{,}$ shows all solutions as points $(x,y)\text{.}$

• When the equation is written as a dependent variable equal to an expression of the dependent variable, points on the graph can be quickly tabulated using the formula.

• Using regression to find a trend line or regression curve can give an approximate relation corresponding to observed data. Treating the resulting equation as a model equation can give approximate predictions of states of the system.

### Subsection1.4.7Exercises

Each problem has an equation involving two variables. Determine whether each of the given states for those variables are in the solution set.

###### 1

$3x-2y=8$

1. $(x,y)=(0,-4)$
2. $(x,y)=(1,-2)$
3. $(x,y)=(4,2)$
###### 2

$2w+5z-3=w^2+z^2$

1. $(w,z)=(-2,2)$
2. $(w,z)=(-1,3)$
3. $(w,z)=(3,2)$

Each problem has an equation involving multiple variables. Solve for the indicated dependent variable.

###### 3Perimeter of Rectangle

Given $2L + 2W = P\text{,}$ solve for $W\text{.}$

###### 4Volume of Rectangular Prism

Given $V = LWH\text{,}$ solve for $L\text{.}$

###### 5Volume of Cylinder

Given $V = \pi r^2 h\text{,}$ solve for $h\text{.}$

###### 6Ideal Gas Law

Given $PV = nRT\text{,}$ solve for $P\text{.}$

###### 7Michaelis-Menten Kinetics

Given $R = \frac{AC}{C+K}\text{,}$ solve for $C\text{.}$

Given an equation relating two variables, solve for the indicated dependent variable. Use your resulting expression to calculate the value for that variable given the values of the indicated independent variable. Make note of any values that are not defined. Plot the corresponding points in the solution set of the equation on a graph.

###### 8

Given the equation $4x + 5y = 20\text{,}$ find $y$ for each value $x \in \{1, 2, 3, 4, 5\}\text{.}$

###### 9

Given the equation $n p = 1000\text{,}$ find $n$ for each value $p \in \{10, 20, 25, 40, 50\}\text{.}$

###### 10

Given the equation $\displaystyle \frac{ac}{c+50} = 200\text{,}$ find $c$ for each value $a \in \{100, 200, 300, 400\}\text{.}$

Graph two dependent variables representing the expressions on each side of the equation. Use the points of intersection to identify solutions to the equation. Verify that the values you identify are solutions by testing whether make the equation true.

###### 11

$3x-5 = x+2$

###### 12

$\displaystyle \frac{20x}{x+4} = x+3$

###### 13

$4x^3-9x^2 = x-6$

Additional DNA samples were run in the same electrophoresis gel as described in Example 1.4.15. Using the data and regression curve from that example, estimate the length of each sample. Indicate whether the approximation is appropriate.

###### 14

Estimate the length of a DNA sample that traveled 200 pixels.

###### 15

Estimate the length of a DNA sample that traveled 335 pixels.

###### 16

Estimate the length of a DNA sample that traveled 350 pixels.

###### A Voltage–Resistance–Current Relationship

A simple electric circuit has an applied voltage $V$ (volts) and a variable load resistance $R$ (kilohms). When the circuit is closed, current flows through the circuit, measured as the current $I$ (amperes). When the voltage was held constant at $V = 9$ V, the resistance and current were measured with values recorded in the table below. The following group of problems are based on these data.

 $V$ (V) $R$ (kΩ) $I$ (A) 9.0 0.84 0.0107 9.0 1.2 0.0073 9.0 1.8 0.0050 9.0 2.7 0.0033 9.0 3.4 0.0026
###### 17

Create a scatter plot of $(R,I)\text{.}$ Would a trend line make sense for this data? Explain.

###### 18

Conductance $G$ is the reciprocal of resistance, $G=1/R\text{.}$ Create a scatter plot of $(G,I)\text{.}$ Would a trend line make sense for this data? Explain.

###### 19

One of the previous scatter plots should have had a meaningful trend line. State an appropriate regression equation as a model and use it to predict the current $I$ when the resistance is $R=2.1$ kΩ.

###### Population-Growth Relationships

The number of births and of deaths in a population generally depends on the size of the population. The table below gives population data for ten of the twelve highest population cities in the state of Virginia for the year 2012. The data include the population $P$ and the total number of births $B$ and deaths $D$ for the year recorded for each city. The following group of exercises are based on these data.

 City $P$ $B$ $D$ Virginia Beach 447021 6270 2828 Norfolk 245782 3773 1827 Chesapeake 228417 2805 1582 Richmond 210309 2939 1849 Newport News 180726 2905 1438 Alexandria 146294 2763 686 Roanoke 97469 1492 1172 Portsmouth 96470 1534 980 Suffolk 85181 1087 726 Lynchburg 77113 1062 779
###### 20

Create a scatter plot of $(P,B)$ and find the equation of the trend line. The cities of Hampton and Harrisonburg were left off the list with populations of $P=136836$ and $P=50981\text{,}$ respectively. Use the trend line regression model to predict the number of births in these cities during 2012. Which calculation is an example of interpolation and which is extrapolation?

###### 21

Create a scatter plot of $(P,D)$ and find the equation of the trend line. Use the trend line regression model to predict the number of births in Hampton and Harrisonburg during 2012. (See the previous problem for population values.) Which calculation is an example of interpolation and which is extrapolation?

Which of the calculations were examples of interpolation and which were examples of extrapolation?