Skip to content
Snippets Groups Projects
Commit d8b233db authored by Lewin Bormann's avatar Lewin Bormann :joy_cat:
Browse files

thesis: move bayes example to appendix

parent a41f2b8b
No related branches found
No related tags found
No related merge requests found
No preview for this file type
No preview for this file type
......@@ -30,6 +30,8 @@
\input{mainmatter/bayes}
\input{mainmatter/methodology}
\input{mainmatter/scenario}
\input{mainmatter/conclusion}
\input{mainmatter/appendix}
\backmatter
{
......
\appendix
\mychapter{Instructional Example of Bayesian Inference}{app:bayes}
In order to show how the concepts introduced above work in practice, an instructional example will be
developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be
implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code.
The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known,
and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP)
of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed
remotely, but any observations come with uncertainties.
The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta =
P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as general
knowledge about the scenario is added to the model using these variables. In this case, the prior is defined
as follows:
\begin{itemize}
\item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power.
\item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency
towards the middle of this range.
\end{itemize}
\begin{quotation}
\itshape
The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as
inspiration and thus was chosen from the same general area.
\end{quotation}
In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and
the following set of characteristics can be determined to some precision:
\begin{itemize}
\item Number of employees
\item Number of cooling towers
\item Number of transformers
\item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power).
\end{itemize}
Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to
generally be good linear approximations with similar power plants (such relations don't have to be linear, but for
the sake of simplicity they are chosen as such):
\begin{itemize}
\item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty.
\item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size.
\item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy.
\item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year.
\end{itemize}
Observations of the NPP facility have shown that
\begin{itemize}
\item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection;
\item there are \num{7} cooling towers;
\item \num{7} transformers, and
\item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a
total of \SI{36e3}{\kilo\gram} of waste.
\end{itemize}
% \begin{table}
% \centering
% \begin{tabular}{r|ll}
% \toprule
% Quantity & Relation to Parameters & Observed \\
% \midrule
% No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\
% No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\
% No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\
% Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\
% \bottomrule
% \end{tabular}
%
% \end{table}
Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random
during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it
is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two
unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6
transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations,
but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric
power.
As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the
collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with
the observations:
\begin{lstlisting}
import pymc3 as pm
employees = 250
towers = 7
transformers = 7
waste = 36000
with pm.Model() as nppmodel:
# Parameters to be inferred, and their priors.
power_e = pm.Uniform("power_e", lower=1000, upper=2000)
eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05)
# Observations
o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees)
o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers)
o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos)
o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste)
\end{lstlisting}
The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't
need to be: there could be more than one observation for each variable. The model itself is introduced using a
\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following
statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and
information about likelihood calculation..
The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described
earlier. These translate our belief before observations into probability distributions. The probability distributions
are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for
the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the
inference result.
\begin{align}
P &\sim \text{\itshape Uniform}(1000, 2000) \\
\eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05)
\end{align}
Also note that these priors do not translate directly into physical reality: the fact that there is a small probability
of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible.
Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the
computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect
\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to
provide worse contributions to a precise inference result.
\begin{figure}
\centering
\begin{subfigure}{.5\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf}
\caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.}
\label{fig:bayes:prior_pure}
\end{subfigure}
\begin{subfigure}{1\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf}
\caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for
\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.}
\label{fig:bayes:prior_rest}
\end{subfigure}
\caption{Prior distributions of the used model.}
\end{figure}
Below the declarations of distributions for parameters to be inferred, there are now four more distributions.
These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their
parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$.
The prior information is expressed in these RVs in two different ways: first, as the translation from the other
variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also
prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides
a way of drawing random samples from the prior distributions, which only includes the prior distributions but no
measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}.
The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their
role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of
parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the
four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i',
\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions
depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for
the sample $i$:
\begin{align}
(P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\
L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i'))
\end{align}
where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'),
\sigma_k(P_i', \eta_i')$. For example, according to the definition in the
code above, the ``employees'' density function would be:
\begin{equation}
p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30)
\end{equation}
with the normal distribution's probability density function
\begin{equation}
\label{eqn:bayes:normaldist}
\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}.
\end{equation}
Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples:
\begin{lstlisting}
with nppmodel:
trace = pm.sample(2_000, return_inferencedata=True)
\end{lstlisting}
After having drawn many samples from a Markov chain based on the specified probability
distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This
can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in
order to find the underlying function: areas of high likelihood in the parameter space contain more samples,
corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial
standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating
function from the drawn samples.
In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk
is visualized in two dimensions in \autoref{fig:bayes:walk}.
From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean
$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density
interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the
\SI{94}{\percent} highest density interval \cite{arviz2019}:
\begin{center}
\begin{tabular}{r|ccccc}
~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\
\midrule
\cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\
\cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\
\end{tabular}
\end{center}
For reference: the artificial observations at the beginning of this
section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations
later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively
large uncertainty of its prediction, which originates from the uncertain observations used by the inference.
\begin{figure}[h]
\centering
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf}
\caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution
of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple
Markov chains were used
in the inference process, one distribution and one walk is shown for each chain.}
\label{fig:bayes:trace}
\end{subfigure}
\begin{subfigure}{0.7\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf}
\caption{The random walk, visualized in two dimensions. Consecutive points are
spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation
can be observed.}
\label{fig:bayes:walk}
\end{subfigure}
\caption{Result of a Markov chain Monte Carlo sampling process.}
\label{fig:bayes:exampleresult}
\end{figure}
......@@ -40,7 +40,8 @@ established before any observations have come into play.
\item[Prior Predictive Distribution,] the joint distribution $p(\vec y, \vec \theta)$ marginalized over $\vec \theta$
(i.e., the probability of $\vec y$ without considering $\vec \theta$); it is the expected distribution of
measurements when assuming the prior. Defined as \begin{equation}p(\vec y) = \int p(\vec y, \vec \theta) \dd
\vec \theta\end{equation}
\vec \theta
\end{equation}
\item[Likelihood] $p(\vec y \mid \vec \theta)$, proportional to the probability an observation $\vec y$ if $\vec \theta$
is the true parameter vector of the process. The likelihood is ``proportional'' to the probability, because it is not normalized to
......@@ -192,18 +193,18 @@ be estimated. \cite[Ch.\,11]{Gelman2014}
\begin{figure}
\centering
\includegraphics[width=1\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf}
\includegraphics[width=.5\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf}
\caption{Metropolis-Hastings random walk over a bivariate distribution ($p(x, y) \propto \sin(x) \sin(y)$) showing
1000 samples: high-likelihood areas (yellow) receive more samples, but transitions between different such areas
are unlikely and can lead to non-proportional samples; the start value was located in the bottom left hotspot. This
causes that area to receive the highest number of samples.}
are unlikely and can lead to disproportionately few samples drawn from other areas; the start value was located in the
bottom left hotspot. This causes that area to receive the highest number of samples.}
\label{fig:bayes:mhwalk}
\end{figure}
\begin{figure}
\centering
% maybe move towards the algorithm section?
\includegraphics[width=1\textwidth]{figures/bayes/npp_mc_integration.pdf}
\includegraphics[width=.6\textwidth]{figures/bayes/npp_mc_integration.pdf}
\caption{Numerical reconstruction by integration of a standard normal distribution's \gls{pdf}, using a
trivial Monte Carlo method with 1000 samples. \emph{Top}: the distribution's probability density function in green, and
blue random samples within its area. \emph{Bottom}: reconstructed density function calculated by differentiating the
......@@ -225,248 +226,8 @@ Finally, it must be reiterated that the main part of this work does not employ M
algorithm. Even if it works with a sufficient number of samples, it has resulted in subpar results compared to other,
more advanced algorithms. \cite[Ch. 11]{Gelman2014}, \cite{salvatier2016probabilistic}
\mysection{Instructional Example}{bayes:example}
In order to show how the concepts introduced above work in practice, an instructional example will be
developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be
implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code.
The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known,
and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP)
of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed
remotely, but any observations come with uncertainties.
The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta =
P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as general
knowledge about the scenario is added to the model using these variables. In this case, the prior is defined
as follows:
\begin{itemize}
\item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power.
\item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency
towards the middle of this range.
\end{itemize}
\begin{quotation}
\itshape
The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as
inspiration and thus was chosen from the same general area.
\end{quotation}
In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and
the following set of characteristics can be determined to some precision:
\begin{itemize}
\item Number of employees
\item Number of cooling towers
\item Number of transformers
\item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power).
\end{itemize}
Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to
generally be good linear approximations with similar power plants (such relations don't have to be linear, but for
the sake of simplicity they are chosen as such):
\begin{itemize}
\item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty.
\item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size.
\item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy.
\item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year.
\end{itemize}
Observations of the NPP facility have shown that
\begin{itemize}
\item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection;
\item there are \num{7} cooling towers;
\item \num{7} transformers, and
\item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a
total of \SI{36e3}{\kilo\gram} of waste.
\end{itemize}
% \begin{table}
% \centering
% \begin{tabular}{r|ll}
% \toprule
% Quantity & Relation to Parameters & Observed \\
% \midrule
% No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\
% No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\
% No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\
% Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\
% \bottomrule
% \end{tabular}
%
% \end{table}
Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random
during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it
is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two
unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6
transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations,
but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric
power.
As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the
collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with
the observations:
\begin{lstlisting}
import pymc3 as pm
employees = 250
towers = 7
transformers = 7
waste = 36000
with pm.Model() as nppmodel:
# Parameters to be inferred, and their priors.
power_e = pm.Uniform("power_e", lower=1000, upper=2000)
eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05)
# Observations
o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees)
o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers)
o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos)
o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste)
\end{lstlisting}
The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't
need to be: there could be more than one observation for each variable. The model itself is introduced using a
\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following
statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and
information about likelihood calculation..
The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described
earlier. These translate our belief before observations into probability distributions. The probability distributions
are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for
the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the
inference result.
\begin{align}
P &\sim \text{\itshape Uniform}(1000, 2000) \\
\eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05)
\end{align}
Also note that these priors do not translate directly into physical reality: the fact that there is a small probability
of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible.
Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the
computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect
\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to
provide worse contributions to a precise inference result.
\begin{figure}
\centering
\begin{subfigure}{.5\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf}
\caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.}
\label{fig:bayes:prior_pure}
\end{subfigure}
\begin{subfigure}{1\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf}
\caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for
\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.}
\label{fig:bayes:prior_rest}
\end{subfigure}
\caption{Prior distributions of the used model.}
\end{figure}
Below the declarations of distributions for parameters to be inferred, there are now four more distributions.
These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their
parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$.
The prior information is expressed in these RVs in two different ways: first, as the translation from the other
variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also
prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides
a way of drawing random samples from the prior distributions, which only includes the prior distributions but no
measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}.
The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their
role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of
parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the
four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i',
\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions
depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for
the sample $i$:
\begin{align}
(P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\
L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i'))
\end{align}
where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'),
\sigma_k(P_i', \eta_i')$. For example, according to the definition in the
code above, the ``employees'' density function would be:
\begin{equation}
p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30)
\end{equation}
with the normal distribution's probability density function
\begin{equation}
\label{eqn:bayes:normaldist}
\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}.
\end{equation}
Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples:
\begin{lstlisting}
with nppmodel:
trace = pm.sample(2_000, return_inferencedata=True)
\end{lstlisting}
After having drawn many samples from a Markov chain based on the specified probability
distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This
can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in
order to find the underlying function: areas of high likelihood in the parameter space contain more samples,
corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial
standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating
function from the drawn samples.
In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk
is visualized in two dimensions in \autoref{fig:bayes:walk}.
From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean
$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density
interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the
\SI{94}{\percent} highest density interval \cite{arviz2019}:
\begin{center}
\begin{tabular}{r|ccccc}
~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\
\midrule
\cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\
\cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\
\end{tabular}
\end{center}
For reference: the artificial observations at the beginning of this
section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations
later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively
large uncertainty of its prediction, which originates from the uncertain observations used by the inference.
\begin{figure}[h]
\centering
\begin{subfigure}{0.8\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf}
\caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution
of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple Markov chains were used
in the inference process, one distribution and one walk is shown for each chain.}
\label{fig:bayes:trace}
\end{subfigure}
\begin{subfigure}{0.7\textwidth}
\includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf}
\caption{The random walk, visualized in two dimensions. Consecutive points are
spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation
can be observed.}
\label{fig:bayes:walk}
\end{subfigure}
\caption{Result of a Markov chain Monte Carlo sampling process.}
\label{fig:bayes:exampleresult}
\end{figure}
\mysection{Instructional Example}{bayes:exampleref}
A simple example of a Bayesian inference problem solved using \pymc~is presented in \autoref{ch:app:bayes}. As
described in \autoref{ch:methodology}, the approach chosen in this work for reconstructing nuclear fuel cycles is
slightly different from typical Bayesian inference problems, as the likelihood function is not known.
\mychapter{Conclusion}{conclusion}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment