diff --git a/thesis/backmatter/fev.xopp b/thesis/backmatter/fev.xopp index a7c16c0e7e0e94cf31087b9a020e1ae044316cfd..eab8c3a047f85bc6936af22b8fcc48a612014fa0 100644 Binary files a/thesis/backmatter/fev.xopp and b/thesis/backmatter/fev.xopp differ diff --git a/thesis/backmatter/fev_filled.pdf b/thesis/backmatter/fev_filled.pdf index b3b64b4d72c0c8686f84005079d075e6c0a9b37a..34f7e8d50ab07bffac41b42023547e5cd74381b3 100644 Binary files a/thesis/backmatter/fev_filled.pdf and b/thesis/backmatter/fev_filled.pdf differ diff --git a/thesis/main.tex b/thesis/main.tex index 4308fe27431237c097ec6ea622f7fa117c69dec1..ae0444b66ce9ffa34fa6a729814854356172ad6f 100644 --- a/thesis/main.tex +++ b/thesis/main.tex @@ -30,6 +30,8 @@ \input{mainmatter/bayes} \input{mainmatter/methodology} \input{mainmatter/scenario} +\input{mainmatter/conclusion} +\input{mainmatter/appendix} \backmatter { diff --git a/thesis/mainmatter/appendix.tex b/thesis/mainmatter/appendix.tex new file mode 100644 index 0000000000000000000000000000000000000000..c9f42f16430162dff5fa138a28183cfd5237163c --- /dev/null +++ b/thesis/mainmatter/appendix.tex @@ -0,0 +1,248 @@ +\appendix + +\mychapter{Instructional Example of Bayesian Inference}{app:bayes} + +In order to show how the concepts introduced above work in practice, an instructional example will be +developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be +implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code. + +The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known, +and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP) +of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed +remotely, but any observations come with uncertainties. + +The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta = +P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as general +knowledge about the scenario is added to the model using these variables. In this case, the prior is defined +as follows: +\begin{itemize} + \item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power. + \item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency +towards the middle of this range. +\end{itemize} + +\begin{quotation} +\itshape +The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as +inspiration and thus was chosen from the same general area. +\end{quotation} + +In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and +the following set of characteristics can be determined to some precision: + +\begin{itemize} + \item Number of employees + \item Number of cooling towers + \item Number of transformers + \item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power). +\end{itemize} + +Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to +generally be good linear approximations with similar power plants (such relations don't have to be linear, but for +the sake of simplicity they are chosen as such): + +\begin{itemize} + \item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty. + \item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size. + \item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy. + \item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year. +\end{itemize} + +Observations of the NPP facility have shown that + +\begin{itemize} + \item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection; + \item there are \num{7} cooling towers; + \item \num{7} transformers, and + \item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a +total of \SI{36e3}{\kilo\gram} of waste. +\end{itemize} + +% \begin{table} +% \centering +% \begin{tabular}{r|ll} +% \toprule +% Quantity & Relation to Parameters & Observed \\ +% \midrule +% No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\ +% No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\ +% No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\ +% Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\ +% \bottomrule +% \end{tabular} +% +% \end{table} + + +Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random +during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it +is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two +unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6 +transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations, +but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric +power. + +As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the +collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with +the observations: + +\begin{lstlisting} +import pymc3 as pm + +employees = 250 +towers = 7 +transformers = 7 +waste = 36000 + +with pm.Model() as nppmodel: + # Parameters to be inferred, and their priors. + power_e = pm.Uniform("power_e", lower=1000, upper=2000) + eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05) + + # Observations + o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees) + o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers) + o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos) + o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste) +\end{lstlisting} + +The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't +need to be: there could be more than one observation for each variable. The model itself is introduced using a +\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following +statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and +information about likelihood calculation.. + +The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described +earlier. These translate our belief before observations into probability distributions. The probability distributions +are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for +the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the +inference result. + +\begin{align} + P &\sim \text{\itshape Uniform}(1000, 2000) \\ + \eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05) +\end{align} + +Also note that these priors do not translate directly into physical reality: the fact that there is a small probability +of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible. +Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the +computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect +\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to +provide worse contributions to a precise inference result. + +\begin{figure} + \centering +\begin{subfigure}{.5\textwidth} +\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf} + \caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.} + \label{fig:bayes:prior_pure} + \end{subfigure} + + \begin{subfigure}{1\textwidth} +\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf} + \caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for +\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.} + \label{fig:bayes:prior_rest} + \end{subfigure} + + \caption{Prior distributions of the used model.} +\end{figure} + +Below the declarations of distributions for parameters to be inferred, there are now four more distributions. +These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their +parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$. +The prior information is expressed in these RVs in two different ways: first, as the translation from the other +variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also +prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides +a way of drawing random samples from the prior distributions, which only includes the prior distributions but no +measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}. + +The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their +role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of +parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the +four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i', +\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions +depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for +the sample $i$: + +\begin{align} + (P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\ + L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i')) +\end{align} + +where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'), +\sigma_k(P_i', \eta_i')$. For example, according to the definition in the +code above, the ``employees'' density function would be: + +\begin{equation} + p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30) +\end{equation} + +with the normal distribution's probability density function +\begin{equation} +\label{eqn:bayes:normaldist} +\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}. +\end{equation} + +Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples: + +\begin{lstlisting} +with nppmodel: + trace = pm.sample(2_000, return_inferencedata=True) +\end{lstlisting} + +After having drawn many samples from a Markov chain based on the specified probability +distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This +can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in +order to find the underlying function: areas of high likelihood in the parameter space contain more samples, +corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial +standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating +function from the drawn samples. + +In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk +is visualized in two dimensions in \autoref{fig:bayes:walk}. + +From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean +$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density +interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the +\SI{94}{\percent} highest density interval \cite{arviz2019}: + +\begin{center} + +\begin{tabular}{r|ccccc} + ~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\ + \midrule + \cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\ + \cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\ +\end{tabular} +\end{center} + +For reference: the artificial observations at the beginning of this +section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations +later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively +large uncertainty of its prediction, which originates from the uncertain observations used by the inference. + +\begin{figure}[h] + \centering + \begin{subfigure}{0.8\textwidth} + \includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf} + \caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution +of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple +Markov chains were used +in the inference process, one distribution and one walk is shown for each chain.} + \label{fig:bayes:trace} + \end{subfigure} + + \begin{subfigure}{0.7\textwidth} + \includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf} + \caption{The random walk, visualized in two dimensions. Consecutive points are +spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation +can be observed.} + \label{fig:bayes:walk} + \end{subfigure} + + \caption{Result of a Markov chain Monte Carlo sampling process.} + \label{fig:bayes:exampleresult} +\end{figure} + + diff --git a/thesis/mainmatter/bayes.tex b/thesis/mainmatter/bayes.tex index 13316efc7894b3fcba4e5469f29aee3cbfd446bc..b8c00181cd8ea905798f19abfe1d15a85bac6816 100644 --- a/thesis/mainmatter/bayes.tex +++ b/thesis/mainmatter/bayes.tex @@ -40,7 +40,8 @@ established before any observations have come into play. \item[Prior Predictive Distribution,] the joint distribution $p(\vec y, \vec \theta)$ marginalized over $\vec \theta$ (i.e., the probability of $\vec y$ without considering $\vec \theta$); it is the expected distribution of measurements when assuming the prior. Defined as \begin{equation}p(\vec y) = \int p(\vec y, \vec \theta) \dd -\vec \theta\end{equation} +\vec \theta +\end{equation} \item[Likelihood] $p(\vec y \mid \vec \theta)$, proportional to the probability an observation $\vec y$ if $\vec \theta$ is the true parameter vector of the process. The likelihood is ``proportional'' to the probability, because it is not normalized to @@ -192,18 +193,18 @@ be estimated. \cite[Ch.\,11]{Gelman2014} \begin{figure} \centering - \includegraphics[width=1\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf} + \includegraphics[width=.5\textwidth]{figures/bayes/mh_randomwalk_sinemvar.pdf} \caption{Metropolis-Hastings random walk over a bivariate distribution ($p(x, y) \propto \sin(x) \sin(y)$) showing 1000 samples: high-likelihood areas (yellow) receive more samples, but transitions between different such areas -are unlikely and can lead to non-proportional samples; the start value was located in the bottom left hotspot. This - causes that area to receive the highest number of samples.} +are unlikely and can lead to disproportionately few samples drawn from other areas; the start value was located in the +bottom left hotspot. This causes that area to receive the highest number of samples.} \label{fig:bayes:mhwalk} \end{figure} \begin{figure} \centering % maybe move towards the algorithm section? - \includegraphics[width=1\textwidth]{figures/bayes/npp_mc_integration.pdf} + \includegraphics[width=.6\textwidth]{figures/bayes/npp_mc_integration.pdf} \caption{Numerical reconstruction by integration of a standard normal distribution's \gls{pdf}, using a trivial Monte Carlo method with 1000 samples. \emph{Top}: the distribution's probability density function in green, and blue random samples within its area. \emph{Bottom}: reconstructed density function calculated by differentiating the @@ -225,248 +226,8 @@ Finally, it must be reiterated that the main part of this work does not employ M algorithm. Even if it works with a sufficient number of samples, it has resulted in subpar results compared to other, more advanced algorithms. \cite[Ch. 11]{Gelman2014}, \cite{salvatier2016probabilistic} -\mysection{Instructional Example}{bayes:example} - -In order to show how the concepts introduced above work in practice, an instructional example will be -developed and shown, using \pymc. It doesn't make use of any special features in the library, and could also be -implemented using Metropolis-Hastings by hand (without any framework) in less than 100 lines of code. - -The scenario is a classic situation where Bayesian statistics is useful: Some very basic facts about a system are known, -and there is knowledge in the form of individual data points. Specifically, a Nuclear Power Plant (NPP) -of unknown power output $P_{el}$ and efficiency grade $\eta$ is considered. Some aspects of it can be observed -remotely, but any observations come with uncertainties. - -The goal is to \emph{infer} probability distributions for the power output $P$ and the efficiency grade $\eta = -P_{electric}/P_{thermal}$ of the NPP from the observations. These are modeled as \emph{prior distributions}, as general -knowledge about the scenario is added to the model using these variables. In this case, the prior is defined -as follows: -\begin{itemize} - \item NPPs of this size are generally expected to generate \SIrange{1000}{2000}{\mega\watt} of electrical power. - \item NPPs of this type usually have an efficiency grade $\eta$ between \SIrange{30}{40}{\percent}, with a tendency -towards the middle of this range. -\end{itemize} - -\begin{quotation} -\itshape -The fact that this is a nuclear power plant is orthogonal to the core topic of this work; it merely serves as -inspiration and thus was chosen from the same general area. -\end{quotation} - -In this scenario, there have to be some simplifying assumptions. The NPP of interest can be observed remotely, and -the following set of characteristics can be determined to some precision: - -\begin{itemize} - \item Number of employees - \item Number of cooling towers - \item Number of transformers - \item Amount of yearly waste per \si{\mega\watt}th (megawatts of thermal power). -\end{itemize} - -Further, there is some experience with other NPPs of similar types, and some likely average values have been shown to -generally be good linear approximations with similar power plants (such relations don't have to be linear, but for -the sake of simplicity they are chosen as such): - -\begin{itemize} - \item \SI{100}{\mega\watt} of \emph{electrical} power require 20 employees on daily duty. - \item \SI{500}{\mega\watt} of \emph{thermal} power require one cooling tower of the observed size. - \item \SI{250}{\mega\watt} of \emph{electrical} power flow through each transformer; this number includes redundancy. - \item \SI{10}{\kilo\gram\per\mega\watt} of waste is generated for every \emph{thermal} Megawatt per year. -\end{itemize} - -Observations of the NPP facility have shown that - -\begin{itemize} - \item every day, \num{250} cars show up on the employment parking lot, and there is no bus connection; - \item there are \num{7} cooling towers; - \item \num{7} transformers, and - \item roughly 60 containers of each \SI{600}{\kilo\gram} of spent fuel leave the facility each year, for a -total of \SI{36e3}{\kilo\gram} of waste. -\end{itemize} - -% \begin{table} -% \centering -% \begin{tabular}{r|ll} -% \toprule -% Quantity & Relation to Parameters & Observed \\ -% \midrule -% No. of employees & \SI{100}{\mega\watt} (el.) require 20 employees & 250 cars \\ -% No. of cooling towers & \SI{500}{\mega\watt} (th.) require one tower & 7 towers \\ -% No. of transformers & \SI{250}{\mega\watt} (el.) require one transformer & 7 transformers \\ -% Waste mass & \SI{10}{\kilogram} of waste is generated for every th. Megawatt per year & \SI{36}{\tonne\per\year} \\ -% \bottomrule -% \end{tabular} -% -% \end{table} - - -Such a set of observations is called the \emph{sample} (not to be confused with the \emph{samples} generated at random -during the \gls{inference} process!) and was introduced as $\vec y$ in the first section of this chapter. Note that it -is difficult to directly calculate the parameters of interest from these numbers, for two reasons: First, there are two -unknown parameters (power output and efficiency grade); and second, there are conflicting facts. For example, 6 -transformers translate to $6 \times \SI{250}{\mega\watt} = \SI{1500}{\mega\watt}$ according to the assumed relations, -but $250\mathrm{~employees} \times \SI{5}{\mega\watt}\,\mathrm{(employee)}^{-1} = \SI{1250}{\mega\watt}$ of electric -power. - -As a next step, a \pymc~model with random variables for the prior and posterior distributions can be created from the -collected information. It incorporates prior beliefs (the linear dependencies shown above), and is parameterized with -the observations: - -\begin{lstlisting} -import pymc3 as pm - -employees = 250 -towers = 7 -transformers = 7 -waste = 36000 - -with pm.Model() as nppmodel: - # Parameters to be inferred, and their priors. - power_e = pm.Uniform("power_e", lower=1000, upper=2000) - eta = pm.TruncatedNormal("eta", lower=0, mu=0.35, sigma=0.05) - - # Observations - o_empl = pm.Normal("employees", mu=power_e * 0.2, sigma=30, observed=employees) - o_towers = pm.Normal("towers", mu=power_e / (eta*500), sigma=1, observed=towers) - o_transformers = pm.Normal("transformers", mu=power_e / 250, sigma=1, observed=trafos) - o_waste = pm.Normal("waste", mu=power_e / eta * 10, sigma=4000, observed=waste) -\end{lstlisting} - -The observations (the variable declarations at the top) are each one-dimensional in this case, although they wouldn't -need to be: there could be more than one observation for each variable. The model itself is introduced using a -\cd{with} block; this is \pymc's way of grouping multiple statements relating to one and the same model: all following -statements within the indented block refer to the newly created \cd{nppmodel}, which stores the random variables and -information about likelihood calculation.. - -The first two variables \cd{power\_e} and \cd{eta} are declared as \emph{prior distributions}, and have been described -earlier. These translate our belief before observations into probability distributions. The probability distributions -are expressed as random variables (RVs) within the \pymc~framework; these prior RVs in particular form the basis for -the Bayesian \gls{inference}, by encoding the prior beliefs. Such an assumption will improve the quality of the -inference result. - -\begin{align} - P &\sim \text{\itshape Uniform}(1000, 2000) \\ - \eta &\sim \mathcal N(\mu = 0.35, \sigma = 0.05) -\end{align} - -Also note that these priors do not translate directly into physical reality: the fact that there is a small probability -of assuming a \SI{50}{\percent} efficiency grade (\autoref{fig:bayes:prior_pure}) does not mean that it is plausible. -Such distributions are modeled taking into consideration not only our physical prior knowledge, but also the -computational limitations of the algorithms employed by \pymc. Using different prior distributions will affect -\gls{inference} quality: a prior distribution encoding less specific knowledge about a scenario can be expected to -provide worse contributions to a precise inference result. - -\begin{figure} - \centering -\begin{subfigure}{.5\textwidth} -\includegraphics[width=\textwidth]{figures/bayes/npp_prior_pure.pdf} - \caption{Histograms of samples drawn from prior distributions. Power in \si{\mega\watt}.} - \label{fig:bayes:prior_pure} - \end{subfigure} - - \begin{subfigure}{1\textwidth} -\includegraphics[width=\textwidth]{figures/bayes/npp_prior_rest.pdf} - \caption{Histograms of samples drawn from the prior distributions parameterized by the prior distributions for -\cd{power\_e} and \cd{eta}. Waste in \si{\kilogram}.} - \label{fig:bayes:prior_rest} - \end{subfigure} - - \caption{Prior distributions of the used model.} -\end{figure} - -Below the declarations of distributions for parameters to be inferred, there are now four more distributions. -These are also prior distributions, but with two characteristics distinguishing them from the first two: For one, their -parameters (here: $\mu$) depend on other random variables, specifically: the prior random variables $P$ and $\eta$. -The prior information is expressed in these RVs in two different ways: first, as the translation from the other -variables $P, \eta$ into the observed ones (the facts how many employees are needed to generate each Megawatt is also -prior information), and as $\sigma$ encoding the measurement uncertainty of the estimates. \pymc~provides -a way of drawing random samples from the prior distributions, which only includes the prior distributions but no -measurements; a visualization of this is provided in \autoref{fig:bayes:prior_rest}. - -The last four RVs seem similar to the first two, but the \cd{observed=} keyword actually changes their -role in the Bayesian inference process: they are used to calculate the likelihood $L_i$ of each randomly sampled set of -parameters $(P_i', \eta_i')$ (the $'$ marks them as a proposed sample instead of an actual observation). First, the -four observed RVs need to be recalculated for every random sample $(P_i', \eta_i')$, as the scale $\sigma_k(P_i', -\eta_i')$ and location $\mu_k(P_i', \eta_i')$ ($k = 1..4$) of each single one of the four normal distributions -depend on the sampled RVs. Based on these distributions, \pymc~calculates the observation $\vec y$'s likelihoods for -the sample $i$: - -\begin{align} - (P_i', \eta_i') &\sim (\text{\itshape Uniform}(1000, 2000),~\mathcal{N}(0.35, 0.05)) \\ - L_i &= \prod_k p_k(y_k; \mu_k(P_i', \eta_i'), \sigma_k(P_i', \eta_i')) -\end{align} - -where $p_k$ is the $k$th normal distribution's density function parameterized by $\mu_k(P_i', \eta_i'), -\sigma_k(P_i', \eta_i')$. For example, according to the definition in the -code above, the ``employees'' density function would be: - -\begin{equation} - p_1(230; \mu(P), \sigma) = \Phi(250~\mathrm{employees}; \mu = 0.2 P ~\si{\per\mega\watt}, \sigma = 30) -\end{equation} - -with the normal distribution's probability density function -\begin{equation} -\label{eqn:bayes:normaldist} -\Phi(x; \mu, \sigma) = \frac{\exp(-\frac{(x-\mu)^2}{2\sigma^2})}{\sqrt{2\pi\sigma^2}}. -\end{equation} - -Samples are drawn using the \cd{pm.sample()} function, which returns a \emph{trace} containing the samples: - -\begin{lstlisting} -with nppmodel: - trace = pm.sample(2_000, return_inferencedata=True) -\end{lstlisting} - -After having drawn many samples from a Markov chain based on the specified probability -distributions, the resulting samples will be mostly located in areas of high likelihood within the parameter space. This -can also be considered a numerical (Monte Carlo) integration of the joint probability distribution of $(P, \eta)$ in -order to find the underlying function: areas of high likelihood in the parameter space contain more samples, -corresponding to a larger ``area under the curve'' for such areas. An example of this mechanism is shown for a trivial -standard normal distribution in \autoref{fig:bayes:numint}. Different methods can be used to reconstruct the generating -function from the drawn samples. - -In this example, the resulting trace is shown as a \emph{trace plot} in \autoref{fig:bayes:trace}, and the random walk -is visualized in two dimensions in \autoref{fig:bayes:walk}. - -From the sampled data, the \textsc{arviz} library can calculate the essential statistics, which include the sample mean -$\overline x$, the sample standard deviation $s$, and the lower and upper limit of the HDI (\emph{highest density -interval}, a type of credible interval used in Bayesian statistics), which by default is chosen to be the -\SI{94}{\percent} highest density interval \cite{arviz2019}: - -\begin{center} - -\begin{tabular}{r|ccccc} - ~ & $\overline x$ & $s$ & HDI (\SI{3}{\percent}) & HDI (\SI{97}{\percent}) \\ - \midrule - \cd{power\_e} & \num{1349.1} & \num{110.9} & \num{1151.4} & \num{1565.7} \\ - \cd{eta} & \SI{37.4}{\percent} & \SI{3.5}{\percent}p & \SI{31.0}{\percent} & \SI{44.2}{\percent} \\ -\end{tabular} -\end{center} - -For reference: the artificial observations at the beginning of this -section were chosen from the ``true'' values (or ``ground truth''; this will be important in the fuel cycle simulations -later on, too) of $P_0 = \SI{1400}{\mega\watt}$ and $\eta = \num{0.38}$. The result's main problem is the relatively -large uncertainty of its prediction, which originates from the uncertain observations used by the inference. - -\begin{figure}[h] - \centering - \begin{subfigure}{0.8\textwidth} - \includegraphics[width=\textwidth]{figures/bayes/npp_trace.pdf} - \caption{A \emph{trace plot} generated by sampling the NPP model. \emph{Left}: the estimated probability distribution -of each parameter and multiple chains. \emph{Right}: the random walk in each parameter dimension. Because multiple Markov chains were used -in the inference process, one distribution and one walk is shown for each chain.} - \label{fig:bayes:trace} - \end{subfigure} - - \begin{subfigure}{0.7\textwidth} - \includegraphics[width=\textwidth]{figures/bayes/npp_walk.pdf} - \caption{The random walk, visualized in two dimensions. Consecutive points are -spatially correlated. Increasing efficiency $\eta$ leads to increased electrical power output; therefore, a correlation -can be observed.} - \label{fig:bayes:walk} - \end{subfigure} - - \caption{Result of a Markov chain Monte Carlo sampling process.} - \label{fig:bayes:exampleresult} -\end{figure} - +\mysection{Instructional Example}{bayes:exampleref} +A simple example of a Bayesian inference problem solved using \pymc~is presented in \autoref{ch:app:bayes}. As +described in \autoref{ch:methodology}, the approach chosen in this work for reconstructing nuclear fuel cycles is +slightly different from typical Bayesian inference problems, as the likelihood function is not known. diff --git a/thesis/mainmatter/conclusion.tex b/thesis/mainmatter/conclusion.tex new file mode 100644 index 0000000000000000000000000000000000000000..4c95784e2d3eb09e6d3c131eb5c4abce23b775cf --- /dev/null +++ b/thesis/mainmatter/conclusion.tex @@ -0,0 +1 @@ +\mychapter{Conclusion}{conclusion}