Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
Example - Plain version
=======================
This example illustrates the usage of the Plain version of SHIRE based on provided datasets.
The foundations of the following illustrated example are indentical to the one described
in :doc:`example`.
Similarly to the discussed GUI version example, where the focus was strongly focused
on the initialisation of the process and less on the underlying principles, in the following
the Python script *settings.py* will be discussed.
Datasets
---------
| All necessary datasets can be found in the Gitlab repository in the examples folder.
| **Geospatial datasets**:
| *European Union's Copernicus Land Monitoring Service information:*
| Imperviousness Density 2018 (https://doi.org/10.2909/3bf542bd-eebd-4d73-b53c-a0243f2ed862)
| Dominant Leaf Type 2018 (https://doi.org/10.2909/7b28d3c1-b363-4579-9141-bdd09d073fd8)
| CORINE Land Cover 2018 (https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac)
| All datasets were edited. Imperviousness Density 2018 and Dominant Leaf Type 2018 were merged from smaller tiles and then stored in a netCDF4 file.
| **Landslide database**:
| Datenquelle Hangmuren-Datenbank, Eidg. Forschungsanstalt WSL, Forschungseinheit Gebirgshydrologie & Massenbewegungen (status October 2024).
| The spatial coordinates of the landslide locations were transformed into WGS84 coordinates using QGIS.
| **Absence locations database**:
| Randomly sampled locations outside of a buffer zone around the entries in the landslide database. The database contains more absence locations than will be integrated into the example. This is intentional as
both landslide as well as absence locations are removed during the training dataset generation process if one of their features
contains a no data value. Having additional absence locations available allows SHIRE to integrate the number of absence locations
as intended by the user.
| **Metadata files**:
| keys_to_include_examples.csv
| data_summary_examples.csv
Launching SHIRE
---------------
Similarly to the GUI version, it is also recommended to launch SHIRE for the Plain version from the command line:
.. code-block:: console
(venv) $ python shire.py
However, as the Plain version doesn't use the Python package *tkinter*, it can be run from any Python editor.
Before launching SHIRE, it is necessary to implement the information prompted for in the four GUIs (see :doc:`example`) in *settings.py*.
Settings file
-------------
In *data/* there is a *settings_template.py* Python script which needs to be prepared before running SHIRE.
**Beware!:** When launching SHIRE, the *settings.py* file (rename the template after using it) must be in the same folder as the script *shire.py*, i.e. in
the current folder structure *src/plain_version/*.
Each parameter declared in *settings.py* comes with a short description to make filling out the script easier. In the following, the content of *settings.py* file
in the context of the example will be illustrated. The method *export_variables* does not need to be adapted. It is responsible for printing the settings to the logging
file to make the premises under which the resulting map was produced traceable and whole process reproducible.
Sure! Here’s your long table converted into an HTML table format with left-aligned content in each cell:
.. raw:: html
<style>
.my-table {
width: 100%;
}
.my-table th {
width: 20%;
}
.my-table td {
width: 20%;
}
.my-table .col-50 {
width: 50%;
}
.my-table .col-10 {
width: 10%;
}
</style>
<table class="my-table">
<caption>This is my table</caption>
<thead>
<tr>
<th>Parameter</th>
<th class="col-50">Description</th>
<th class="col-10">Value Type</th>
<th>Example Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>training_dataset</td>
<td class="col-50">True if training dataset shall be created, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>preprocessing</td>
<td class="col-50">Defines preprocessing approach: 'cluster', 'interpolation', 'no_interpolation'</td>
<td class="col-10">String</td>
<td>'no_interpolation'</td>
</tr>
<tr>
<td>train_from_scratch</td>
<td class="col-50">True for generation of training dataset from scratch, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>train_delete</td>
<td class="col-50">True if feature(s) shall be removed from existing training dataset, else False. train_from_scratch=False and train_delete=False adds feature(s) to existing training dataset</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>prediction_dataset</td>
<td class="col-50">True if prediction dataset shall be created, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>pred_from_scratch</td>
<td class="col-50">True for generation of prediction dataset from scratch, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>pred_delete</td>
<td class="col-50">True if feature(s) shall be removed from existing prediction dataset, else False. pred_from_scratch=False and pred_delete=False adds feature(s) to existing prediction dataset</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>map_generation</td>
<td class="col-50">True if Random Forest model shall be trained and/or landslide susceptibility/hazard map shall be created</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>crs</td>
<td class="col-50">Coordinate Reference System, important metadata information</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>no_value</td>
<td class="col-50">No data value that indicates in the final map where prediction was not possible</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>random_seed</td>
<td class="col-50">Random seed</td>
<td class="col-10">Integer</td>
<td>42</td>
</tr>
<tr>
<td>resolution</td>
<td class="col-50">Goal resolution of the final map</td>
<td class="col-10">Integer</td>
<td>250</td>
</tr>
<tr>
<td>path_ml</td>
<td class="col-50">Reference path on the local machine or external hard drive to the base directory for storing SHIRE's products</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>data_summary_path</td>
<td class="col-50">Path to the *data_summary.csv* file</td>
<td class="col-10">String</td>
<td>'../../examples/data_summary.csv'</td>
</tr>
<tr>
<td>key_to_include_path</td>
<td class="col-50">Path to the *keys_to_include.csv* file</td>
<td class="col-10">String</td>
<td>'../../examples/keys_to_include.csv'</td>
</tr>
<tr>
<td>size</td>
<td class="col-50">Fraction of the training dataset to be used as test dataset</td>
<td class="col-10">float</td>
<td>0.25</td>
</tr>
<tr>
<td>path_train</td>
<td class="col-50">Path/directory to store created training dataset in</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>ohe</td>
<td class="col-50">True, if categorical variables shall be one-hot encoded, False for ordinal encoding</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
<tr>
<td>path_landslide_database</td>
<td class="col-50">Path a landslide database stored locally or on an external hard drive</td>
<td class="col-10">String</td>
<td>'../../examples/<br>landslide_coordinates_wgs84.csv'</td>
</tr>
<tr>
<td>ID</td>
<td class="col-50">Name of the column in the landslide database that contains the ID of the instances</td>
<td class="col-10">String</td>
<td>'Ereignis-Nr'</td>
</tr>
<tr>
<td>landslide_database_x</td>
<td class="col-50">Name of the column in the landslide database that contains the longitude coordinates</td>
<td class="col-10">String</td>
<td>'X'</td>
</tr>
<tr>
<td>landslide_database_y</td>
<td class="col-50">Name of the column in the landslide database that contains the latitude coordinates</td>
<td class="col-10">String</td>
<td>'Y'</td>
</tr>
<tr>
<td>bounding_box</td>
<td class="col-50">Bounding box of map to be created ([ymax, ymin, xmin, xmax])</td>
<td class="col-10">List</td>
<td>[47.8, 45.8, 5.9, 10.5]</td>
</tr>
<tr>
<td>path_pred</td>
<td class="col-50">Path/directory to store created prediction dataset in</td>
<td class="col-10">String</td>
<td>path_ml</td>
</tr>
<tr>
<td>RF_training</td>
<td class="col-50">True, if Random Forest model shall be trained from scratch, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>RF_prediction</td>
<td class="col-50">True, if final map shall be generated, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>not_included_pred_data</td>
<td class="col-50">Feature(s) to drop from the prediction dataset before applying the trained model to the prediction dataset</td>
<td class="col-10">List</td>
<td>['xcoord', 'ycoord']</td>
</tr>
<tr>
<td>not_included_train_data</td>
<td class="col-50">Feature(s) to drop from the training dataset before training the Random Forest model</td>
<td class="col-10">List</td>
<td>[]</td>
</tr>
<tr>
<td>num_trees</td>
<td class="col-50">Number of trees in the Random Forest</td>
<td class="col-10">Integer</td>
<td>100</td>
</tr>
<tr>
<td>criterion</td>
<td class="col-50">Evaluation criterion</td>
<td class="col-10">String</td>
<td>'gini'</td>
</tr>
<tr>
<td>depth</td>
<td class="col-50">Depth of a Random Forest tree</td>
<td class="col-10">Integer</td>
<td>20</td>
</tr>
<tr>
<td>model_to_save</td>
<td class="col-50">Name of model folder in model_database_dir to store the trained Random Forest model in. Will be created in model_database_dir</td>
<td class="col-10">String</td>
<td>'Switzerland_Map'</td>
</tr>
<tr>
<td>model_to_load</td>
<td class="col-50">Name of model folder that contains the model to be loaded for map production. Typically identical to model_to_save</td>
<td class="col-10">String</td>
<td>'Switzerland_Map'</td>
</tr>
<tr>
<td>model_database_dir</td>
<td class="col-50">Path on the local machine or external hard drive to the directory for storing the model folders</td>
<td class="col-10">String</td>
<td>'../../examples/'</td>
</tr>
<tr>
<td>parallel</td>
<td class="col-50">If susceptibility/hazard shall be predicted in parallel</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
</tbody>
</table>