Skip to content
Snippets Groups Projects
Commit 5f4b22be authored by Ann-Kathrin Margarete Edrich's avatar Ann-Kathrin Margarete Edrich
Browse files

Add Sphinx documentation

parent c560314e
Branches gh-pages
Tags
No related merge requests found
Pipeline #1620064 failed
......@@ -6,7 +6,6 @@ utilities/__pycache__/
archive/
examples/
docs/build/
docs/source/
# Ignore all pickle files
*.pkl
......
image: python:3.7
before_script:
- pip install sphinx
- pip install -r requirements.txt
stages:
- build
build-docs:
stage: build
script:
- sphinx-build -b html source/ _build/html
artifacts:
paths:
- _build/html
only:
- main
pages:
stage: deploy
script:
- mv _build/html public
artifacts:
paths:
- public
only:
- main
docs/source/_static/images/bar1.png

47.9 KiB

docs/source/_static/images/bar2.png

42.1 KiB

docs/source/_static/images/intro.png

91.9 KiB

docs/source/_static/images/mapping.png

195 KiB

docs/source/_static/images/prediction.png

150 KiB

docs/source/_static/images/results.png

873 KiB

docs/source/_static/images/training.png

159 KiB

# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
project = 'SHIRE'
copyright = '2024, Ann-Kathrin Edrich'
author = 'Ann-Kathrin Edrich'
# The full version, including alpha/beta/rc tags
release = 'October 2024'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
html_static_path = ['_static']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'classic'
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
\ No newline at end of file
Example - Plain version
=======================
This example illustrates the usage of the Plain version of SHIRE based on provided datasets.
The foundations of the following illustrated example are indentical to the one described
in :doc:`example`.
Similarly to the discussed GUI version example, where the focus was strongly focused
on the initialisation of the process and less on the underlying principles, in the following
the Python script *settings.py* will be discussed.
Datasets
---------
| All necessary datasets can be found in the Gitlab repository in the examples folder.
| **Geospatial datasets**:
| *European Union's Copernicus Land Monitoring Service information:*
| Imperviousness Density 2018 (https://doi.org/10.2909/3bf542bd-eebd-4d73-b53c-a0243f2ed862)
| Dominant Leaf Type 2018 (https://doi.org/10.2909/7b28d3c1-b363-4579-9141-bdd09d073fd8)
| CORINE Land Cover 2018 (https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac)
| All datasets were edited. Imperviousness Density 2018 and Dominant Leaf Type 2018 were merged from smaller tiles and then stored in a netCDF4 file.
| **Landslide database**:
| Datenquelle Hangmuren-Datenbank, Eidg. Forschungsanstalt WSL, Forschungseinheit Gebirgshydrologie & Massenbewegungen (status October 2024).
| The spatial coordinates of the landslide locations were transformed into WGS84 coordinates using QGIS.
| **Absence locations database**:
| Randomly sampled locations outside of a buffer zone around the entries in the landslide database. The database contains more absence locations than will be integrated into the example. This is intentional as
both landslide as well as absence locations are removed during the training dataset generation process if one of their features
contains a no data value. Having additional absence locations available allows SHIRE to integrate the number of absence locations
as intended by the user.
| **Metadata files**:
| keys_to_include_examples.csv
| data_summary_examples.csv
Launching SHIRE
---------------
Similarly to the GUI version, it is also recommended to launch SHIRE for the Plain version from the command line:
.. code-block:: console
(venv) $ python shire.py
However, as the Plain version doesn't use the Python package *tkinter*, it can be run from any Python editor.
Before launching SHIRE, it is necessary to implement the information prompted for in the four GUIs (see :doc:`example`) in *settings.py*.
Settings file
-------------
In *data/* there is a *settings_template.py* Python script which needs to be prepared before running SHIRE.
**Beware!:** When launching SHIRE, the *settings.py* file (rename the template after using it) must be in the same folder as the script *shire.py*, i.e. in
the current folder structure *src/plain_version/*.
Each parameter declared in *settings.py* comes with a short description to make filling out the script easier. In the following, the content of *settings.py* file
in the context of the example will be illustrated. The method *export_variables* does not need to be adapted. It is responsible for printing the settings to the logging
file to make the premises under which the resulting map was produced traceable and whole process reproducible.
Sure! Here’s your long table converted into an HTML table format with left-aligned content in each cell:
.. raw:: html
<style>
.my-table {
width: 100%;
}
.my-table th {
width: 20%;
}
.my-table td {
width: 20%;
}
.my-table .col-50 {
width: 50%;
}
.my-table .col-10 {
width: 10%;
}
</style>
<table class="my-table">
<caption>This is my table</caption>
<thead>
<tr>
<th>Parameter</th>
<th class="col-50">Description</th>
<th class="col-10">Value Type</th>
<th>Example Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>training_dataset</td>
<td class="col-50">True if training dataset shall be created, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>preprocessing</td>
<td class="col-50">Defines preprocessing approach: 'cluster', 'interpolation', 'no_interpolation'</td>
<td class="col-10">String</td>
<td>'no_interpolation'</td>
</tr>
<tr>
<td>train_from_scratch</td>
<td class="col-50">True for generation of training dataset from scratch, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>train_delete</td>
<td class="col-50">True if feature(s) shall be removed from existing training dataset, else False. train_from_scratch=False and train_delete=False adds feature(s) to existing training dataset</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>prediction_dataset</td>
<td class="col-50">True if prediction dataset shall be created, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>pred_from_scratch</td>
<td class="col-50">True for generation of prediction dataset from scratch, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>pred_delete</td>
<td class="col-50">True if feature(s) shall be removed from existing prediction dataset, else False. pred_from_scratch=False and pred_delete=False adds feature(s) to existing prediction dataset</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>map_generation</td>
<td class="col-50">True if Random Forest model shall be trained and/or landslide susceptibility/hazard map shall be created</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>crs</td>
<td class="col-50">Coordinate Reference System, important metadata information</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>no_value</td>
<td class="col-50">No data value that indicates in the final map where prediction was not possible</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>random_seed</td>
<td class="col-50">Random seed</td>
<td class="col-10">Integer</td>
<td>42</td>
</tr>
<tr>
<td>resolution</td>
<td class="col-50">Goal resolution of the final map</td>
<td class="col-10">Integer</td>
<td>250</td>
</tr>
<tr>
<td>path_ml</td>
<td class="col-50">Reference path on the local machine or external hard drive to the base directory for storing SHIRE's products</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>data_summary_path</td>
<td class="col-50">Path to the *data_summary.csv* file</td>
<td class="col-10">String</td>
<td>'../../examples/data_summary.csv'</td>
</tr>
<tr>
<td>key_to_include_path</td>
<td class="col-50">Path to the *keys_to_include.csv* file</td>
<td class="col-10">String</td>
<td>'../../examples/keys_to_include.csv'</td>
</tr>
<tr>
<td>size</td>
<td class="col-50">Fraction of the training dataset to be used as test dataset</td>
<td class="col-10">float</td>
<td>0.25</td>
</tr>
<tr>
<td>path_train</td>
<td class="col-50">Path/directory to store created training dataset in</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>ohe</td>
<td class="col-50">True, if categorical variables shall be one-hot encoded, False for ordinal encoding</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>path_landslide_database</td>
<td class="col-50">Path a landslide database stored locally or on an external hard drive</td>
<td class="col-10">String</td>
<td>'../../examples/<br>landslide_coordinates_wgs84.csv'</td>
</tr>
<tr>
<td>ID</td>
<td class="col-50">Name of the column in the landslide database that contains the ID of the instances</td>
<td class="col-10">String</td>
<td>'Ereignis-Nr'</td>
</tr>
<tr>
<td>landslide_database_x</td>
<td class="col-50">Name of the column in the landslide database that contains the longitude coordinates</td>
<td class="col-10">String</td>
<td>'X'</td>
</tr>
<tr>
<td>landslide_database_y</td>
<td class="col-50">Name of the column in the landslide database that contains the latitude coordinates</td>
<td class="col-10">String</td>
<td>'Y'</td>
</tr>
<tr>
<td>bounding_box</td>
<td class="col-50">Bounding box of map to be created ([ymax, ymin, xmin, xmax])</td>
<td class="col-10">List</td>
<td>[47.8, 45.8, 5.9, 10.5]</td>
</tr>
<tr>
<td>path_pred</td>
<td class="col-50">Path/directory to store created prediction dataset in</td>
<td class="col-10">String</td>
<td>path_ml</td>
</tr>
<tr>
<td>RF_training</td>
<td class="col-50">True, if Random Forest model shall be trained from scratch, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>RF_prediction</td>
<td class="col-50">True, if final map shall be generated, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>not_included_pred_data</td>
<td class="col-50">Feature(s) to drop from the prediction dataset before applying the trained model to the prediction dataset</td>
<td class="col-10">List</td>
<td>['xcoord', 'ycoord']</td>
</tr>
<tr>
<td>not_included_train_data</td>
<td class="col-50">Feature(s) to drop from the training dataset before training the Random Forest model</td>
<td class="col-10">List</td>
<td>[]</td>
</tr>
<tr>
<td>num_trees</td>
<td class="col-50">Number of trees in the Random Forest</td>
<td class="col-10">Integer</td>
<td>100</td>
</tr>
<tr>
<td>criterion</td>
<td class="col-50">Evaluation criterion</td>
<td class="col-10">String</td>
<td>'gini'</td>
</tr>
<tr>
<td>depth</td>
<td class="col-50">Depth of a Random Forest tree</td>
<td class="col-10">Integer</td>
<td>20</td>
</tr>
<tr>
<td>model_to_save</td>
<td class="col-50">Name of model folder in model_database_dir to store the trained Random Forest model in. Will be created in model_database_dir</td>
<td class="col-10">String</td>
<td>'Switzerland_Map'</td>
</tr>
<tr>
<td>model_to_load</td>
<td class="col-50">Name of model folder that contains the model to be loaded for map production. Typically identical to model_to_save</td>
<td class="col-10">String</td>
<td>'Switzerland_Map'</td>
</tr>
<tr>
<td>model_database_dir</td>
<td class="col-50">Path on the local machine or external hard drive to the directory for storing the model folders</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>parallel</td>
<td class="col-50">If susceptibility/hazard shall be predicted in parallel</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>keep_cat_features</td>
<td class="col-50">True if instances in the input dataset without categorical class information shall be kept and proceeded as intended, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>remove_instances</td>
<td class="col-50">True if instances in the input dataset without categorical class information shall be removed and marked with the no data value in the final map, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
</tbody>
</table>
Example - GUI version
=====================
This example illustrates the usage of the GUI version of SHIRE based on provided datasets.
In the following, the generation of a training dataset and prediction dataset
is shown step by step. The generated input datasets are then used for creating
a landslide susceptibility map.
**This example is designed for illustration purposes only! The produced map is not
intended for any analysis purpose. Caution is advised.**
The example illustrates how to use SHIRE for a binary susceptibility assessment to the occurrence of shallow
landslides in Switzerland.
Preliminary considerations
--------------------------
The first decision to be made is if the GUI version or the Plain version
shall be used. The following example shows the GUI version. The Plain version
is introduced in :doc:`example-plain`
Datasets
---------
| All necessary datasets can be found in the Gitlab repository in the examples folder.
| **Geospatial datasets**:
| *European Union's Copernicus Land Monitoring Service information:*
| Imperviousness Density 2018 (https://doi.org/10.2909/3bf542bd-eebd-4d73-b53c-a0243f2ed862)
| Dominant Leaf Type 2018 (https://doi.org/10.2909/7b28d3c1-b363-4579-9141-bdd09d073fd8)
| CORINE Land Cover 2018 (https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac)
| All datasets were edited. Imperviousness Density 2018 and Dominant Leaf Type 2018 were merged from smaller tiles and then stored in a netCDF4 file.
| **Landslide database**:
| Datenquelle Hangmuren-Datenbank, Eidg. Forschungsanstalt WSL, Forschungseinheit Gebirgshydrologie & Massenbewegungen (status October 2024).
| The spatial coordinates of the landslide locations were transformed into WGS84 coordinates using QGIS.
| **Absence locations database**:
| Randomly sampled locations outside of a buffer zone around the entries in the landslide database. The database contains more absence locations than will be integrated into the example. This is intentional as
both landslide as well as absence locations are removed during the training dataset generation process if one of their features
contains a no data value. Having additional absence locations available allows SHIRE to integrate the number of absence locations
as intended by the user.
| **Metadata files**:
| keys_to_include_examples.csv
| data_summary_examples.csv
Launching SHIRE
---------------
It is recommended to launch SHIRE for the GUI version from the command line:
.. code-block:: console
(venv) $ python shire.py
.. figure:: _static/images/intro.png
:scale: 80%
:align: center
Then the window on the left opens. In this window all basic settings are defined
which are relevant for training and prediction dataset generation as well as model training and map generation.
As described in the user manual in the git repository, at the top of the window, the desired option(s) need to be ticked.
Under **General settings**, you must provide:
- The desired resolution for the final susceptibility map. In this example we want to generate a map with 100m resolution.
- The general no-data value to indicate locations where susceptibility assessment isn't possible, here -999 is chosen.
- The coordinate reference system (CRS), which is important metadata. Coordinates in the geospatial datasets and in the landslide and absence locations database are given in wgs84 coordinates.
- The random seed to make the process reproducible, here 42.
**Save the settings for later use** can be ticked if the process needs to be repeated several times, or if you want to save settings for later comparison. When rerunning the process, you can use the **Import settings** button to introduce only changes to the mask.
The settings will be saved in a pickle file called settings.pkl in the folder that you specify when you klick
**Submit** to proceed to the next step. Depending on the option(s) you ticked at the top, a different window opens.
Please be aware that SHIRE sticks to a specific order if several steps are initialized at ones.
.. raw:: html
<div style="clear: both;"></div>
Training dataset generation
---------------------------
.. figure:: _static/images/training.png
:scale: 70%
:align: center
If a training dataset shall be generated, the window above opens.
If you klick on the buttons given under **Provide path to:**
a window opens to manually navigate to the individual files. Please
refer to the user manual for the necessary structure of the *dataset_summary.csv*
and the landslide and absence locations databases. You can also check the
provided datasets for this example.
The *keys_to_include.csv* file contains the feature keys specified in data_summary.csv
and define which available dataset shall actually be used in training dataset generation.
For this example, all three datasets are used.
Choose the directory where you want to save the generated training dataset to.
The dataset is automatically named *training.csv*. Beware that existing files with the same name are overwritten!
Under **Number of absence locations in the training dataset** provide the number of absence locations you want
to integrate into your training dataset. If the same number of absence locations are provided as there are entries in the
landslide inventory, then SHIRE assumes that this is intentional and the ratio of 1:1 shall be kept even if some landslide
instances need to be removed due to missing geospatial information and the total number of absence locations are reduced accordingly.
Here, the landslide inventory contains 762 entries, therefore, also 762 absence locations shall be integrated into the training dataset.
Tick **One-hot encoding of categorical variables** if the categorical features in the training dataset shall be
one-hot encoded. This means that a separate column is introduced into the training dataset with the naming convention
*<feature name>_<class number>*. As currently only numeric raster data are supported by SHIRE this is sufficient.
In this example, the box is not ticked which means that the training dataset is ordinal encoded.
Furthermore, the naming information of the landslide inventory and absence location database need to be provided.
For the provided exemplary landslide inventory, the longitude values are contained in a column called *X*, the
latitude values in *Y* and as it is a Swiss dataset, the ID column is called *Ereignis-Nr*. The absence locations
database contains variables called *Longitude* and *Latitude* which contains the respective coordinates.
**Save the settings for later use** can be ticked if the process needs to be repeated several times, or if you want to save settings for later comparison. When rerunning the process, you can use the **Import settings** button to introduce only changes to the mask.
The settings will be saved in a pickle file called *settings_train.pkl* in the folder that you specify when you klick
**Submit** to proceed to start the training dataset generation if that is the only intention of the run or open the next settings window if you want to proceed with prediction dataset generation.
Under **Choose from:** three different options are available. For this example, we assume that we want to generate a training dataset from scratch
and that there is no existing training dataset that we want to supplement or reduce. Therefore, we choose
**Generate training dataset from scratch**. There are three different compilation strategies implemented in SHIRE.
Here we choose the most simple and most cost-time-effective **No interpolation** option. In this option the geospatial
datasets are not interpolated to the final resolution of the map to extract the properties of the landslide sites and
absence locations. After pressing submit when only generating a training dataset in this run, a seperate window
opens that provides a progress report.
*Adding feature(s) to existing training dataset:*
The *keys_to_include.csv* file then only contains the feature key that is also contained in *data_summary.csv*
which shall be added to the existing training dataset. The **Path to directory for storing the training dataset**
then needs to lead to the directory of the existing training dataset which needs to be called *training.csv*.
Under **Choose from:** now **Add feature(s) to existing training dataset** needs to be chosen. It is recommended to use the same
**Compilation using:** as the other features in the existing training dataset.
*Deleting feature(s) from existing training dataset:*
Similarly to adding features, for deleting them the *keys_to_include.csv* file name only contains the
feature keys as given in *data_summary.csv* that shall be removed from an existing training dataset.
The **Path to directory for storing the training dataset** then needs to lead to the directory of the existing training dataset which needs to be called *training.csv*.
Under **Choose from:** now **Delete feature(s) to existing training dataset** needs to be chosen.
*Choosing One-hot encoding instead of ordinal encoding:*
Tick **One-hot encoding of categorical variables** for one-hot encoding of the categorical variables in the training dataset
instead of the default ordinal encoding.
*Choosing a different compilation appraoch:*
In contrast to the in the example chosen **No interpolation** option, the geospatial datasets can also be interpolated before
extracting the geospatial characteristics of the absence locations and landslide sites in two different ways.
When choosing **Interpolation**, all geospatial datasets are cropped to the minimum spatial extend that contains all landslide sites and absence locations.
Then the cropped datasets are each interpolated to the same coordinate grid with the same spatial resolution as the final map.
Finally, the values are extracted and introduced as features into the training dataset.
This can be quite cost and time intensive depending on the size of the area in which landslides and absence locations
were collected. Therefore, **Clustering** can be used. The absence locations and landslide sites are spatially clustered and
consequently for each cluster the original dataset is cropped and is individually interpolated to the desired resolution
before extracting the feature values. As the spatial extends of the clusters are much smaller this requires less computational
power and time to perform the task for large areas.
The interpolation when choosing **Interpolation** is automatically performed in one of three different ways, depending
on the original size of the geospatial dataset and its size after interpolation. For details, please see the associated
puplications found in the git repository.
.. raw:: html
<div style="clear: both;"></div>
Prediction dataset generation
-----------------------------
.. figure:: _static/images/prediction.png
:scale: 70%
:align: center
If a prediction dataset shall be generated, the window above opens.
If you klick on the buttons given under **Path to summary of geospatial data** and **Features include**
a window opens to manually navigate to the *data_summary.csv* and *keys_to_include.csv* files. Provide the **Path to directory
for storing the prediction dataset** similarily as it was done for the training dataset.
Tick **One-hot encoding of categorical variables** if the categorical features in the training dataset shall be
one-hot encoded. Careful, make sure that this decision is consistent with the training dataset.
As we are using ordinal encoding in this example, the box is not ticked.
Then provide the bounds of the area of interest, the top two lines give the east and west coordinates and the bottom
two the north and south coordinates. Here, the extents of Switzerland were chosen in accordance with the above outlines
test case.
**Save the settings for later use** can be ticked if the process needs to be repeated several times, or if you want to save settings for later comparison. When rerunning the process, you can use the **Import settings** button to introduce only changes to the mask.
The settings will be saved in a pickle file called *settings_pred.pkl* in the folder that you specify when you klick
**Submit** to proceed to start the prediction dataset generation if that is the only intention of the run or open the next settings window if you want to proceed with model training and map generation.
The **Submit** button only appears after choosing from one of the options given under **Choose from:** similar to
the process described for training dataset generation. Here, as we want to generate a prediction dataset from scratch,
we choose the first option.
For more information on **Delete feature(s) from existing prediction dataset** and **Add feature(s) to existing prediction dataset**
see the same options described above for the training dataset generation. The process is identical.
.. raw:: html
<div style="clear: both;"></div>
<div style="clear: both;"></div>
Susceptibility map generation
-----------------------------
.. figure:: _static/images/mapping.png
:scale: 70%
:align: center
Under **Path to training and prediction dataset** in separate windows manually the locations of the training and prediction
datasets on the local machine or external hard drive need to be chosen. Similarly to previous steps, under **Where
do you want the models to be stored** choose the directory where folder shall be created which in the end contains
the mapping results. Consequently, the **Folder name** can be specified in. The mask distinguished between model to save and
model to load. This is because model training and mapping are separate processes which are conducted independently. If
you conduct both at the same time, both fields need to contain the same folder name. However, if you are mapping using
a pretrained model, or only train without prediction you can just provide the respective information.
Not all features that are contained in the training and prediction dataset might be needed for model training and mapping.
It is possible to drop features from the training and prediction dataset under **Features not to consider**. Here, we
don't want to remove any features from the training dataset, however, the prediction dataset still contains the spatial coordinates
within the area of interest for which an individual prediction will be made. As this information should not be part of
the mapping, they need to be removed. The feature names need to be provided in a comma-separated way without any spaces.
Then we also need to provide the **Name of the label column in the training dataset**.
An important decision is made under **How to treat mismatching categories**. If one-hot encoding is used for training
and prediction dataset generation, then it might be that the categories might be mismatching between the two input datasets.
In the case that not all features that are contained in the training dataset are contained in the prediction dataset, the
mapping process is aborted and the model is automatically retrained. Before retraining, the mismatching features are
removed from the training dataset. The results are stored in a separate folder named *<old folder name>_retrain*.
In the case that there are more features contained in the prediction dataset than in the training dataset, the mismatching
features in the prediction dataset are automatically removed before mapping. Furthermore, an identical order of the features
between the input datasets is ensured. If one-hot encoded feature classes are removed from the prediction dataset there will
be instances is the prediction dataset, i.e. individual locations within the area of interest, which are described by
no feature class still contained in the prediction dataset. This means that the value for this feature for all classes
still contained in the prediction dataset is
0. Under **How to treat mismatching categories**, when choosing **Keep instances of mismatching classes**, these instances
are kept in the prediction dataset and are included in the mapping in the same way as all the other locations. When
choosing **Remove instances of mismatching classes**, these instances are handled in the same way as the locations where
at least one feature contains a no data value and they will not be included in the mapping. As we are using ordinal encoding
in this example, this decision of reduced importance for the moment.
Finally, the Random Forest needs to be defined regarding the number of trees, depth of the trees and the evaluation criterion.
For more information, see the documentation for scikit learn's `Random Forest Classifier <https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.RandomForestClassifier.html>`_.
Provide the **Size of the test dataset (0...1)** as well. For the values chosen in this example, refer to the image above.
**Save the settings for later use** can be ticked if the process needs to be repeated several times, or if you want to save settings for later comparison. When rerunning the process, you can use the **Import settings** button to introduce only changes to the mask.
The settings will be saved in a pickle file called *settings_map.pkl* in the folder that you specify when you klick
**Submit** to proceed to start the mapping process if that is the only intention of the run or launch training and/or prediction dataset
generation if several processes were initialized at the same time.
Before klicking **Submit**, it's necessary to choose **What do you want to do?**. It is possible to only train or map or
do both at the same time. When **Mapping** is ticked, it is possible to **Predict in parallel** to speed up map generation.
.. raw:: html
<div style="clear: both;"></div>
<div style="clear: both;"></div>
Final map, output files and validation information
--------------------------------------------------
.. figure:: _static/images/results.png
:scale: 50%
:align: center
The figure above shows the susceptibility map as returned by SHIRE (left) and susceptibility map when combined with Swiss boundaries (right). The yellow areas are predicted as susceptible to landslide occurrence
and blue shows stable areas. In the top right corner in the image on the left, the value is -999, hence the no data value set in the initial GUI. This shows that in this area no prediction of the landslide
susceptibility was possible. The reason for this lies in the geospatial areas which have no information available information in this area as well.
Each of the previously described steps habe their own input files, which have been discussed and are described in the user manual.
When checking the folder of the training and prediction dataset were generated as well as the folder where training and prediction results are stored, it can be seen that
several new files were created.
**Beware!:** The files produced in each run depend also on the chosen options, e.g. regarding compilation strategy of the training dataset.
Most of the files are intended to support transparency and reusability.
**Pickle files:**
The pickle files created after the initialization of each step containing the properties chosen from the run. This provides documentation, transparency and reproducibility.
**Training and prediction dataset generation:**
Of course, the main products of these steps are the training dataset as a csv file and the prediction dataset as a netCDF4 file.
Interpolation of the geospatial datasets either for training or for prediction dataset generation results in the generation of a pickle and a netCDF4 file called
*data_combined_<training/prediction>_<resolution>.<nc/pkl>*. The netCDF4 file contains the interpolated geospatial datasets. They can be used for quality checking the interpolation result.
The pickle file contains the interpolation information.
**Model training and map generation:**
Inside the model folder, in this example called *Switzerland_Map*, there are several datasets:
- *prediction.nc* contains the binary prediction result with the value 1 indicating landslide susceptibility, 0 no landslide susceptibility and -999 no prediction possible
- *prediction_results.csv* contains the prediction result for each individual location within the area of interest before it was reshaped into the final map
- *pos_prediction_results.csv* contains only the locations with landslide susceptibility predicted
- *neg_prediction_results.csv* contains only the locations without predicted susceptibility to landslide occurrence
- *saved_model.pkl* contains the trained Random Forest model
- *model_params.pkl* contains metadata and model quality information
- *feature_importance.csv* contains the feature importance ranking as determined by the Random Forest algorithm
.. SHIRE documentation master file, created by
sphinx-quickstart on Mon Oct 7 13:16:46 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Susceptibility and Hazard mappIng fRamEwork SHIRE
=================================================
.. toctree::
example
example-plain
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
......@@ -31,7 +31,10 @@ class prepare_data:
self.row = 0
self.retrain = retrain
self.import_parameters()
self.logger.info("Susceptibility/hazard map generation started")
if self.retrain:
self.logger.info("Model is retrained")
else:
self.logger.info("Susceptibility/hazard map generation started")
self.master.geometry()
self.master.winfo_toplevel().title("Map generation")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment