example-plain.rst

Example - Plain version
=======================

This example illustrates the usage of the Plain version of SHIRE based on provided datasets. 
The foundations of the following illustrated example are indentical to the one described
in :doc:`example`.

Similarly to the discussed GUI version example, where the focus was strongly focused
on the initialisation of the process and less on the underlying principles, in the following
the Python script *settings.py* will be discussed.

Datasets
---------

| All necessary datasets can be found in the Gitlab repository in the examples folder.

| **Geospatial datasets**:
| *European Union's Copernicus Land Monitoring Service information:*
| Imperviousness Density 2018 (https://doi.org/10.2909/3bf542bd-eebd-4d73-b53c-a0243f2ed862)
| Dominant Leaf Type 2018 (https://doi.org/10.2909/7b28d3c1-b363-4579-9141-bdd09d073fd8)
| CORINE Land Cover 2018 (https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac)

| All datasets were edited. Imperviousness Density 2018 and Dominant Leaf Type 2018 were merged from smaller tiles and then stored in a netCDF4 file.

| **Landslide database**:
| Datenquelle Hangmuren-Datenbank, Eidg. Forschungsanstalt WSL, Forschungseinheit Gebirgshydrologie & Massenbewegungen (status October 2024).
| The spatial coordinates of the landslide locations were transformed into WGS84 coordinates using QGIS.

| **Absence locations database**:
| Randomly sampled locations outside of a buffer zone around the entries in the landslide database. The database contains more absence locations than will be integrated into the example. This is intentional as
  both landslide as well as absence locations are removed during the training dataset generation process if one of their features
  contains a no data value. Having additional absence locations available allows SHIRE to integrate the number of absence locations
  as intended by the user. 

| **Metadata files**:
| keys_to_include_examples.csv
| data_summary_examples.csv

Launching SHIRE
---------------

Similarly to the GUI version, it is also recommended to launch SHIRE for the Plain version from the command line:

.. code-block:: console

   (venv) $ python shire.py
   
However, as the Plain version doesn't use the Python package *tkinter*, it can be run from any Python editor.
Before launching SHIRE, it is necessary to implement the information prompted for in the four GUIs (see :doc:`example`) in *settings.py*.

Settings file
-------------

In *data/* there is a *settings_template.py* Python script which needs to be prepared before running SHIRE. 

**Beware!:** When launching SHIRE, the *settings.py* file (rename the template after using it) must be in the same folder as the script *shire.py*, i.e. in
the current folder structure *src/plain_version/*.

Each parameter declared in *settings.py* comes with a short description to make filling out the script easier. In the following, the content of *settings.py* file
in the context of the example will be illustrated. The method *export_variables* does not need to be adapted. It is responsible for printing the settings to the logging
file to make the premises under which the resulting map was produced traceable and whole process reproducible.

Sure! Here’s your long table converted into an HTML table format with left-aligned content in each cell:

.. raw:: html

    <style>
        .my-table {
            width: 100%;
        }
        .my-table th {
            width: 20%;
        }
        .my-table td {
            width: 20%;
        }
        .my-table .col-50 {
            width: 50%;
        }
        .my-table .col-10 {
            width: 10%;
        }
    </style>
    
    <table class="my-table">
        <caption>This is my table</caption>
        <thead>
            <tr>
                <th>Parameter</th>
                <th class="col-50">Description</th>
                <th class="col-10">Value Type</th>
                <th>Example Values</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>training_dataset</td>
                <td class="col-50">True if training dataset shall be created, else False</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>preprocessing</td>
                <td class="col-50">Defines preprocessing approach: 'cluster', 'interpolation', 'no_interpolation'</td>
                <td class="col-10">String</td>
                <td>'no_interpolation'</td>
            </tr>
            <tr>
                <td>train_from_scratch</td>
                <td class="col-50">True for generation of training dataset from scratch, else False</td>
                <td class="col-10">Bool</td>
                <td>True</td>
            </tr>
            <tr>
                <td>train_delete</td>
                <td class="col-50">True if feature(s) shall be removed from existing training dataset, else False. train_from_scratch=False and train_delete=False adds feature(s) to existing training dataset</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>prediction_dataset</td>
                <td class="col-50">True if prediction dataset shall be created, else False</td>
                <td class="col-10">Bool</td>
                <td>True</td>
            </tr>
            <tr>
                <td>pred_from_scratch</td>
                <td class="col-50">True for generation of prediction dataset from scratch, else False</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>pred_delete</td>
                <td class="col-50">True if feature(s) shall be removed from existing prediction dataset, else False. pred_from_scratch=False and pred_delete=False adds feature(s) to existing prediction dataset</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>map_generation</td>
                <td class="col-50">True if Random Forest model shall be trained and/or landslide susceptibility/hazard map shall be created</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>crs</td>
                <td class="col-50">Coordinate Reference System, important metadata information</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>no_value</td>
                <td class="col-50">No data value that indicates in the final map where prediction was not possible</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>random_seed</td>
                <td class="col-50">Random seed</td>
                <td class="col-10">Integer</td>
                <td>42</td>
            </tr>
            <tr>
                <td>resolution</td>
                <td class="col-50">Goal resolution of the final map</td>
                <td class="col-10">Integer</td>
                <td>250</td>
            </tr>
            <tr>
                <td>path_ml</td>
                <td class="col-50">Reference path on the local machine or external hard drive to the base directory for storing SHIRE's products</td>
                <td class="col-10">String</td>
                <td>'../../examples/'</td>
            </tr>
            <tr>
                <td>data_summary_path</td>
                <td class="col-50">Path to the *data_summary.csv* file</td>
                <td class="col-10">String</td>
                <td>'../../examples/data_summary.csv'</td>
            </tr>
            <tr>
                <td>key_to_include_path</td>
                <td class="col-50">Path to the *keys_to_include.csv* file</td>
                <td class="col-10">String</td>
                <td>'../../examples/keys_to_include.csv'</td>
            </tr>
            <tr>
                <td>size</td>
                <td class="col-50">Fraction of the training dataset to be used as test dataset</td>
                <td class="col-10">float</td>
                <td>0.25</td>
            </tr>
            <tr>
                <td>path_train</td>
                <td class="col-50">Path/directory to store created training dataset in</td>
                <td class="col-10">String</td>
                <td>'../../examples/'</td>
            </tr>
            <tr>
                <td>ohe</td>
                <td class="col-50">True, if categorical variables shall be one-hot encoded, False for ordinal encoding</td>
                <td class="col-10">Bool</td>
                <td>False</td>
            </tr>
            <tr>
                <td>path_landslide_database</td>
                <td class="col-50">Path a landslide database stored locally or on an external hard drive</td>
                <td class="col-10">String</td>
                <td>'../../examples/<br>landslide_coordinates_wgs84.csv'</td>
            </tr>
            <tr>
                <td>ID</td>
                <td class="col-50">Name of the column in the landslide database that contains the ID of the instances</td>
                <td class="col-10">String</td>
                <td>'Ereignis-Nr'</td>
            </tr>
            <tr>
                <td>landslide_database_x</td>
                <td class="col-50">Name of the column in the landslide database that contains the longitude coordinates</td>
                <td class="col-10">String</td>
                <td>'X'</td>
            </tr>
            <tr>
                <td>landslide_database_y</td>
                <td class="col-50">Name of the column in the landslide database that contains the latitude coordinates</td>
                <td class="col-10">String</td>
                <td>'Y'</td>
            </tr>
            <tr>
                <td>bounding_box</td>
                <td class="col-50">Bounding box of map to be created ([ymax, ymin, xmin, xmax])</td>
                <td class="col-10">List</td>
                <td>[47.8, 45.8, 5.9, 10.5]</td>
            </tr>
            <tr>
                <td>path_pred</td>
                <td class="col-50">Path/directory to store created prediction dataset in</td>
                <td class="col-10">String</td>
                <td>path_ml</td>
            </tr>
            <tr>
                <td>RF_training</td>
                <td class="col-50">True, if Random Forest model shall be trained from scratch, else False</td>
                <td class="col-10">Bool</td>
                <td>True</td>
            </tr>
            <tr>
                <td>RF_prediction</td>
                <td class="col-50">True, if final map shall be generated, else False</td>
                <td class="col-10">Bool</td>
                <td>True</td>
            </tr>
            <tr>
                <td>not_included_pred_data</td>
                <td class="col-50">Feature(s) to drop from the prediction dataset before applying the trained model to the prediction dataset</td>
                <td class="col-10">List</td>
                <td>['xcoord', 'ycoord']</td>
            </tr>
            <tr>
                <td>not_included_train_data</td>
                <td class="col-50">Feature(s) to drop from the training dataset before training the Random Forest model</td>
                <td class="col-10">List</td>
                <td>[]</td>
            </tr>
            <tr>
                <td>num_trees</td>
                <td class="col-50">Number of trees in the Random Forest</td>
                <td class="col-10">Integer</td>
                <td>100</td>
            </tr>
            <tr>
                <td>criterion</td>
                <td class="col-50">Evaluation criterion</td>
                <td class="col-10">String</td>
                <td>'gini'</td>
            </tr>
            <tr>
                <td>depth</td>
                <td class="col-50">Depth of a Random Forest tree</td>
                <td class="col-10">Integer</td>
                <td>20</td>
            </tr>
            <tr>
                <td>model_to_save</td>
                <td class="col-50">Name of model folder in model_database_dir to store the trained Random Forest model in. Will be created in model_database_dir</td>
                <td class="col-10">String</td>
                <td>'Switzerland_Map'</td>
            </tr>
            <tr>
                <td>model_to_load</td>
                <td class="col-50">Name of model folder that contains the model to be loaded for map production. Typically identical to model_to_save</td>
                <td class="col-10">String</td>
                <td>'Switzerland_Map'</td>
            </tr>
            <tr>
                <td>model_database_dir</td>
                <td class="col-50">Path on the local machine or external hard drive to the directory for storing the model folders</td>
                <td class="col-10">String</td>
                <td>'../../examples/'</td>
            </tr>
            <tr>
                <td>parallel</td>
                <td class="col-50">If susceptibility/hazard shall be predicted in parallel</td>
                <td class="col-10">Bool</td>
                <td>True</td>
            </tr>
        </tbody>
    </table>