Skip to content
Snippets Groups Projects
Commit 698fa096 authored by Ann-Kathrin Margarete Edrich's avatar Ann-Kathrin Margarete Edrich
Browse files

Changes in Example

parent be2d3d83
No related branches found
No related tags found
No related merge requests found
Pipeline #1620091 passed
......@@ -303,17 +303,5 @@ Sure! Here’s your long table converted into an HTML table format with left-ali
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>keep_cat_features</td>
<td class="col-50">True if instances in the input dataset without categorical class information shall be kept and proceeded as intended, else False</td>
<td class="col-10">Bool</td>
<td>True</td>
</tr>
<tr>
<td>remove_instances</td>
<td class="col-50">True if instances in the input dataset without categorical class information shall be removed and marked with the no data value in the final map, else False</td>
<td class="col-10">Bool</td>
<td>False</td>
</tr>
</tbody>
</table>
......@@ -225,23 +225,6 @@ Susceptibility map generation
the mapping, they need to be removed. The feature names need to be provided in a comma-separated way without any spaces.
Then we also need to provide the **Name of the label column in the training dataset**.
An important decision is made under **How to treat mismatching categories**. If one-hot encoding is used for training
and prediction dataset generation, then it might be that the categories might be mismatching between the two input datasets.
In the case that not all features that are contained in the training dataset are contained in the prediction dataset, the
mapping process is aborted and the model is automatically retrained. Before retraining, the mismatching features are
removed from the training dataset. The results are stored in a separate folder named *<old folder name>_retrain*.
In the case that there are more features contained in the prediction dataset than in the training dataset, the mismatching
features in the prediction dataset are automatically removed before mapping. Furthermore, an identical order of the features
between the input datasets is ensured. If one-hot encoded feature classes are removed from the prediction dataset there will
be instances is the prediction dataset, i.e. individual locations within the area of interest, which are described by
no feature class still contained in the prediction dataset. This means that the value for this feature for all classes
still contained in the prediction dataset is
0. Under **How to treat mismatching categories**, when choosing **Keep instances of mismatching classes**, these instances
are kept in the prediction dataset and are included in the mapping in the same way as all the other locations. When
choosing **Remove instances of mismatching classes**, these instances are handled in the same way as the locations where
at least one feature contains a no data value and they will not be included in the mapping. As we are using ordinal encoding
in this example, this decision of reduced importance for the moment.
Finally, the Random Forest needs to be defined regarding the number of trees, depth of the trees and the evaluation criterion.
For more information, see the documentation for scikit learn's `Random Forest Classifier <https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.RandomForestClassifier.html>`_.
Provide the **Size of the test dataset (0...1)** as well. For the values chosen in this example, refer to the image above.
......@@ -275,10 +258,10 @@ Final map, output files and validation information
Each of the previously described steps habe their own input files, which have been discussed and are described in the user manual.
When checking the folder of the training and prediction dataset were generated as well as the folder where training and prediction results are stored, it can be seen that
When checking the folder of the training and prediction dataset as well as the folder where training and prediction results are stored, it can be seen that
several new files were created.
**Beware!:** The files produced in each run depend also on the chosen options, e.g. regarding compilation strategy of the training dataset.
**Beware!:** The files produced in each run depend also on the chosen options, e.g., regarding compilation strategy of the training dataset.
Most of the files are intended to support transparency and reusability.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment