First, we will import all the libraries we will need.
numpy
for useful numerical computations
scipy
for useful scientific packages
scipy.odr
contains the Orthogonal Distance Regression (odr) function we will use
matplotlib
and seaborn
contain graphing functions
Then we will set up our plotting parameters, to make sure aesthetics are pleasing.
import numpy as np
import scipy as scipy
import scipy.odr as odr
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rc
# set to use tex, but make sure it is sans-serif fonts only
rc('text', usetex=True)
rc('text.latex', preamble=r'\usepackage{cmbright}')
rc('font', **{'family': 'sans-serif', 'sans-serif': ['Helvetica']})
# Magic function to make matplotlib inline;
# other style specs must come AFTER
%matplotlib inline
# This enables SVG graphics inline.
# There is a bug, so uncomment if it works.
%config InlineBackend.figure_formats = {'png', 'retina'}
# JB's favorite Seaborn settings for notebooks
rc = {'lines.linewidth': 2,
'axes.labelsize': 18,
'axes.titlesize': 18,
'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style("dark")
mpl.rcParams['xtick.labelsize'] = 16
mpl.rcParams['ytick.labelsize'] = 16
mpl.rcParams['legend.fontsize'] = 14
Let's go ahead and simulate 15 coordinate points. Each point will have an associated measurement error, but the measurement error is not the same for all points. In particular, we simulate the error for each x- or y-coordinate as being drawn from a normal distribution centered at 0 with a standard deviation of 2.5:
n = 15
x = np.linspace(0, 10, n)
xerr = np.abs(np.random.normal(0, 2.5, n))
x = np.random.normal(x, xerr, n)
y = np.linspace(0, 20, n)
yerr = np.abs(np.random.normal(0, 2.5, n))
y = np.random.normal(y, yerr)
Next, we can fit a line of best fit using the least squares algorithm. To do this, we define our function,
$$ f(x) = a + b\cdot x, $$where $a, b$ are parameters to be determined by fitting. We will use the least squares algorithm by calling scipy.optimize.curve_fit
with the function to be optimized and the data to perform the fitting. What this function will try to do is find parameters $a, b$ such that
is minimized. Here, $y_i$ is the y-coordinate of the $i$th observation, and $x_i$ is the x-coordinate of the $i$th observation.
Let's optimize, find $a, b$ and then plot the results!
def line(x, a):
"""The line of best fit."""
# unpack the parameters:
y = a*x
return y
popt, pcov = scipy.optimize.curve_fit(line, x, y)
# unpack the parameters of the line of best fit:
a = popt
plt.errorbar(x, y, xerr, yerr, 'o')
plt.plot(x, line(x, a), label='fit')
plt.legend(title='Legend')
We can repeat the above procedure. Again, we define the function to be minimized, but now we also define a procedure to perform the odr.
We perform the ODR as follows:
Procedure for perform_odr(data_x, data_y, xerr, yerr)
:
odr.Model
objectodr.Data
object and place the data in itodr.ODR
with the Data
Object, the Model
Object and an initial parameter guessdef odr_line(p, x):
"""The line of best fit."""
# unpack the parameters:
y = p*x
return y
def perform_odr(x, y, xerr, yerr):
"""Finds the ODR for data {x, y} and returns the result"""
linear = odr.Model(odr_line)
mydata = odr.Data(x, y, wd=1./xerr, we=1./yerr)
myodr = odr.ODR(mydata, linear, beta0=[0])
output = myodr.run()
return output
Let's do it!
regression = perform_odr(x, y, xerr, yerr)
Our results are now stored in an odr
object. We can extract the parameters by typing odr.beta[0]
(first parameter, intercept) or odr.beta[1]
for the slope.
plt.errorbar(x, y, xerr, yerr, 'o', alpha=0.7)
plt.plot(x, line(x, a), label='least squares', lw=3, color='blue')
plt.plot(x, line(x, regression.beta[0]), label='ODR', lw=3, color='purple')
# plot the true line:
X = np.linspace(-10, 10, 10)
plt.plot(X, 2*X, label='True', lw=3, ls='--')
plt.legend(title='Legend', loc=(1, .5))
plt.xlim(np.min(x)-1, np.max(x)+1)
plt.ylim(np.min(y)-1, np.max(y)+1)
print(
"""
The true model is y = 2.00x
Least squares returns y = {0:.2g}x
ODR returns y = {1:.2g}x
""".format(a[0], regression.beta[0]))
As errors get larger, ODR will outperform least squares.