thesis: move bayes example to appendix

d8b233db · Lewin Bormann · a41f2b8b · d8b233db · d8b233db · d8b233db
Commit d8b233db authored 3 years ago by Lewin Bormann
--- a/thesis/backmatter/fev.xopp
+++ b/thesis/backmatter/fev.xopp
--- a/thesis/backmatter/fev_filled.pdf
+++ b/thesis/backmatter/fev_filled.pdf
--- a/thesis/main.tex
+++ b/thesis/main.tex
@@ -30,6 +30,8 @@
 \input{mainmatter/bayes}
 \input{mainmatter/methodology}
 \input{mainmatter/scenario}
+\input{mainmatter/conclusion}
+\input{mainmatter/appendix}

 \backmatter
    {

--- a/thesis/mainmatter/appendix.tex
+++ b/thesis/mainmatter/appendix.tex
+\appendix
+
+\mychapter{Instructional Example of Bayesian Inference}{app:bayes}
+
+In order to show how the concepts introduced above work in practice, an instructional example will be
+developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be
+implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code.
+
+The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known,
+and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP)
+of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed
+remotely, but any observations come with uncertainties.
+
+The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta =
+P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as  general
+knowledge about the scenario is added to the model using these variables. In this case, the prior is defined
+as follows:
+\begin{itemize}
+ \item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power.
+ \item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency
+towards the middle of this range.
+\end{itemize}
+
+\begin{quotation}
+\itshape
+The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as
+inspiration and thus was chosen from the same general area.
+\end{quotation}
+
+In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and
+the following set of characteristics can be determined to some precision:
+
+\begin{itemize}
+ \item Number of employees
+ \item Number of cooling towers
+ \item Number of transformers
+ \item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power).
+\end{itemize}
+
+Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to
+generally be good linear approximations with similar power plants (such relations don't have to be linear, but for
+the sake of simplicity they are chosen as such):
+
+\begin{itemize}
+ \item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty.
+ \item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size.
+ \item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy.
+ \item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year.
+\end{itemize}
+
+Observations of the NPP facility have shown that
+
+\begin{itemize}
+ \item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection;
+ \item there are \num{7} cooling towers;
+ \item \num{7} transformers, and
+ \item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a
+total of \SI{36e3}{\kilo\gram} of waste.
+\end{itemize}
+
+% \begin{table}
+%  \centering
+%  \begin{tabular}{r|ll}
+%   \toprule
+%   Quantity & Relation to Parameters & Observed \\
+%   \midrule
+%   No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\
+%   No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\
+%   No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\
+%   Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\
+%   \bottomrule
+%  \end{tabular}
+%
+% \end{table}
+
+
+Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random
+during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it
+is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two
+unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6
+transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations,
+but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric
+power.
+
+As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the
+collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with
+the observations:
+
+\begin{lstlisting}
+import pymc3 as pm
+
+employees = 250
+towers = 7
+transformers = 7
+waste = 36000
+
+with pm.Model() as nppmodel:
+  # Parameters to be inferred, and their priors.
+  power_e = pm.Uniform("power_e", lower=1000, upper=2000)
+  eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05)
+
+  # Observations
+  o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees)
+  o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers)
+  o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos)
+  o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste)
+\end{lstlisting}
+
+The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't
+need to be: there could be more than one observation for each variable. The model itself is introduced using a
+\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following
+statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and
+information about likelihood calculation..
+
+The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described
+earlier. These translate our belief before observations into probability distributions. The probability distributions
+are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for
+the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the
+inference result.
+
+\begin{align}
+ P &\sim \text{\itshape Uniform}(1000, 2000) \\
+ \eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05)
+\end{align}
+
+Also note that these priors do not translate directly into physical reality: the fact that there is a small probability
+of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible.
+Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the
+computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect
+\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to
+provide worse contributions to a precise inference result.
+
+\begin{figure}
+ \centering
+\begin{subfigure}{.5\textwidth}
+\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf}
+ \caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.}
+ \label{fig:bayes:prior_pure}
+ \end{subfigure}
+
+ \begin{subfigure}{1\textwidth}
+\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf}
+ \caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for
+\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.}
+ \label{fig:bayes:prior_rest}
+ \end{subfigure}
+
+ \caption{Prior distributions of the used model.}
+\end{figure}
+
+Below the declarations of distributions for parameters to be inferred, there are now four more distributions.
+These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their
+parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$.
+The prior information is expressed in these RVs in two different ways: first, as the translation from the other
+variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also
+prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides
+a way of drawing random samples from the prior distributions, which only includes the prior distributions but no
+measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}.
+
+The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their
+role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of
+parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the
+four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i',
+\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions
+depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for
+the sample $i$:
+
+\begin{align}
+ (P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\
+ L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i'))
+\end{align}
+
+where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'),
+\sigma_k(P_i', \eta_i')$. For example, according to the definition in the
+code above, the ``employees'' density function would be:
+
+\begin{equation}
+ p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30)
+\end{equation}
+
+with the normal distribution's probability density function
+\begin{equation}
+\label{eqn:bayes:normaldist}
+\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}.
+\end{equation}
+
+Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples:
+
+\begin{lstlisting}
+with nppmodel:
+    trace = pm.sample(2_000, return_inferencedata=True)
+\end{lstlisting}
+
+After having drawn many samples from a Markov chain based on the specified probability
+distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This
+can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in
+order to find the underlying function: areas of high likelihood in the parameter space contain more samples,
+corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial
+standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating
+function from the drawn samples.
+
+In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk
+is visualized in two dimensions in \autoref{fig:bayes:walk}.
+
+From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean
+$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density
+interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the
+\SI{94}{\percent} highest density interval \cite{arviz2019}:
+
+\begin{center}
+
+\begin{tabular}{r|ccccc}
+ ~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\
+ \midrule
+ \cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\
+ \cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\
+\end{tabular}
+\end{center}
+
+For reference: the artificial observations at the beginning of this
+section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations
+later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively
+large uncertainty of its prediction, which originates from the uncertain observations used by the inference.
+
+\begin{figure}[h]
+ \centering
+ \begin{subfigure}{0.8\textwidth}
+  \includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf}
+  \caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution
+of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple
+Markov chains were used
+in the inference process, one distribution and one walk is shown for each chain.}
+  \label{fig:bayes:trace}
+ \end{subfigure}
+
+ \begin{subfigure}{0.7\textwidth}
+  \includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf}
+  \caption{The random walk, visualized in two dimensions. Consecutive points are
+spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation
+can be observed.}
+  \label{fig:bayes:walk}
+ \end{subfigure}
+
+ \caption{Result of a Markov chain Monte Carlo sampling process.}
+ \label{fig:bayes:exampleresult}
+\end{figure}
+
+
--- a/thesis/mainmatter/bayes.tex
+++ b/thesis/mainmatter/bayes.tex
@@ -40,7 +40,8 @@ established before any observations have come into play.
 \item[Prior Predictive Distribution,] the joint distribution $p(\vec y, \vec \theta)$ marginalized over $\vec \theta$
 (i.e., the probability of $\vec y$ without considering $\vec \theta$); it is the expected distribution of
 measurements when assuming the prior. Defined as \begin{equation}p(\vec y) = \int p(\vec y, \vec \theta) \dd
-\vec \theta\end{equation}
+\vec \theta
+\end{equation}

 \item[Likelihood] $p(\vec y \mid \vec \theta)$, proportional to the probability an observation $\vec y$ if $\vec \theta$
 is the true parameter vector of the process. The likelihood is ``proportional'' to the probability, because it is not normalized to
@@ -192,18 +193,18 @@ be estimated. \cite[Ch.\,11]{Gelman2014}

 \begin{figure}
 \centering
- \includegraphics[width=1\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf}
+ \includegraphics[width=.5\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf}
 \caption{Metropolis-Hastings random walk over a bivariate distribution ($p(x, y) \propto \sin(x) \sin(y)$) showing
 1000 samples: high-likelihood areas (yellow) receive more samples, but transitions between different such areas
-are unlikely and can lead to non-proportional samples; the start value was located in the bottom left hotspot. This
-     causes that area to receive the highest number of samples.}
+are unlikely and can lead to disproportionately few samples drawn from other areas; the start value was located in the
+bottom left hotspot. This causes that area to receive the highest number of samples.}
 \label{fig:bayes:mhwalk}
 \end{figure}

 \begin{figure}
 \centering
 % maybe move towards the algorithm section?
-  \includegraphics[width=1\textwidth]{figures/bayes/npp_mc_integration.pdf}
+  \includegraphics[width=.6\textwidth]{figures/bayes/npp_mc_integration.pdf}
  \caption{Numerical reconstruction by integration of a standard normal distribution's \gls{pdf}, using a
 trivial Monte Carlo method with 1000 samples. \emph{Top}: the distribution's probability density function in green, and
 blue random samples within its area. \emph{Bottom}: reconstructed density function calculated by differentiating the
@@ -225,248 +226,8 @@ Finally, it must be reiterated that the main part of this work does not employ M
 algorithm. Even if it works with a sufficient number of samples, it has resulted in subpar results compared to other,
 more advanced algorithms. \cite[Ch. 11]{Gelman2014}, \cite{salvatier2016probabilistic}

-\mysection{Instructional Example}{bayes:example}
-
-In order to show how the concepts introduced above work in practice, an instructional example will be
-developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be
-implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code.
-
-The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known,
-and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP)
-of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed
-remotely, but any observations come with uncertainties.
-
-The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta =
-P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as  general
-knowledge about the scenario is added to the model using these variables. In this case, the prior is defined
-as follows:
-\begin{itemize}
- \item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power.
- \item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency
-towards the middle of this range.
-\end{itemize}
-
-\begin{quotation}
-\itshape
-The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as
-inspiration and thus was chosen from the same general area.
-\end{quotation}
-
-In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and
-the following set of characteristics can be determined to some precision:
-
-\begin{itemize}
- \item Number of employees
- \item Number of cooling towers
- \item Number of transformers
- \item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power).
-\end{itemize}
-
-Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to
-generally be good linear approximations with similar power plants (such relations don't have to be linear, but for
-the sake of simplicity they are chosen as such):
-
-\begin{itemize}
- \item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty.
- \item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size.
- \item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy.
- \item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year.
-\end{itemize}
-
-Observations of the NPP facility have shown that
-
-\begin{itemize}
- \item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection;
- \item there are \num{7} cooling towers;
- \item \num{7} transformers, and
- \item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a
-total of \SI{36e3}{\kilo\gram} of waste.
-\end{itemize}
-
-% \begin{table}
-%  \centering
-%  \begin{tabular}{r|ll}
-%   \toprule
-%   Quantity & Relation to Parameters & Observed \\
-%   \midrule
-%   No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\
-%   No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\
-%   No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\
-%   Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\
-%   \bottomrule
-%  \end{tabular}
-%
-% \end{table}
-
-
-Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random
-during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it
-is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two
-unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6
-transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations,
-but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric
-power.
-
-As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the
-collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with
-the observations:
-
-\begin{lstlisting}
-import pymc3 as pm
-
-employees = 250
-towers = 7
-transformers = 7
-waste = 36000
-
-with pm.Model() as nppmodel:
-  # Parameters to be inferred, and their priors.
-  power_e = pm.Uniform("power_e", lower=1000, upper=2000)
-  eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05)
-
-  # Observations
-  o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees)
-  o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers)
-  o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos)
-  o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste)
-\end{lstlisting}
-
-The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't
-need to be: there could be more than one observation for each variable. The model itself is introduced using a
-\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following
-statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and
-information about likelihood calculation..
-
-The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described
-earlier. These translate our belief before observations into probability distributions. The probability distributions
-are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for
-the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the
-inference result.
-
-\begin{align}
- P &\sim \text{\itshape Uniform}(1000, 2000) \\
- \eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05)
-\end{align}
-
-Also note that these priors do not translate directly into physical reality: the fact that there is a small probability
-of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible.
-Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the
-computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect
-\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to
-provide worse contributions to a precise inference result.
-
-\begin{figure}
- \centering
-\begin{subfigure}{.5\textwidth}
-\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf}
- \caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.}
- \label{fig:bayes:prior_pure}
- \end{subfigure}
-
- \begin{subfigure}{1\textwidth}
-\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf}
- \caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for
-\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.}
- \label{fig:bayes:prior_rest}
- \end{subfigure}
-
- \caption{Prior distributions of the used model.}
-\end{figure}
-
-Below the declarations of distributions for parameters to be inferred, there are now four more distributions.
-These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their
-parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$.
-The prior information is expressed in these RVs in two different ways: first, as the translation from the other
-variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also
-prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides
-a way of drawing random samples from the prior distributions, which only includes the prior distributions but no
-measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}.
-
-The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their
-role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of
-parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the
-four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i',
-\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions
-depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for
-the sample $i$:
-
-\begin{align}
- (P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\
- L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i'))
-\end{align}
-
-where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'),
-\sigma_k(P_i', \eta_i')$. For example, according to the definition in the
-code above, the ``employees'' density function would be:
-
-\begin{equation}
- p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30)
-\end{equation}
-
-with the normal distribution's probability density function
-\begin{equation}
-\label{eqn:bayes:normaldist}
-\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}.
-\end{equation}
-
-Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples:
-
-\begin{lstlisting}
-with nppmodel:
-    trace = pm.sample(2_000, return_inferencedata=True)
-\end{lstlisting}
-
-After having drawn many samples from a Markov chain based on the specified probability
-distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This
-can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in
-order to find the underlying function: areas of high likelihood in the parameter space contain more samples,
-corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial
-standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating
-function from the drawn samples.
-
-In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk
-is visualized in two dimensions in \autoref{fig:bayes:walk}.
-
-From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean
-$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density
-interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the
-\SI{94}{\percent} highest density interval \cite{arviz2019}:
-
-\begin{center}
-
-\begin{tabular}{r|ccccc}
- ~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\
- \midrule
- \cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\
- \cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\
-\end{tabular}
-\end{center}
-
-For reference: the artificial observations at the beginning of this
-section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations
-later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively
-large uncertainty of its prediction, which originates from the uncertain observations used by the inference.
-
-\begin{figure}[h]
- \centering
- \begin{subfigure}{0.8\textwidth}
-  \includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf}
-  \caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution
-of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple Markov chains were used
-in the inference process, one distribution and one walk is shown for each chain.}
-  \label{fig:bayes:trace}
- \end{subfigure}
-
- \begin{subfigure}{0.7\textwidth}
-  \includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf}
-  \caption{The random walk, visualized in two dimensions. Consecutive points are
-spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation
-can be observed.}
-  \label{fig:bayes:walk}
- \end{subfigure}
-
- \caption{Result of a Markov chain Monte Carlo sampling process.}
- \label{fig:bayes:exampleresult}
-\end{figure}
-
+\mysection{Instructional Example}{bayes:exampleref}

+A simple example of a Bayesian inference problem solved using \pymc~is presented in \autoref{ch:app:bayes}. As
+described in \autoref{ch:methodology}, the approach chosen in this work for reconstructing nuclear fuel cycles is
+slightly different from typical Bayesian inference problems, as the likelihood function is not known.
--- a/thesis/mainmatter/conclusion.tex
+++ b/thesis/mainmatter/conclusion.tex
+\mychapter{Conclusion}{conclusion}