Commit 40609548 authored by Evgeny Kusmenko's avatar Evgeny Kusmenko
Browse files

Merge branch 'ML_clustering' into 'master'

Ml clustering

See merge request !34
parents 17d1c533 399ec45c
Pipeline #117559 passed with stages

Too many changes to show.

To preserve performance only 1000 of 1000+ files are displayed.
# Java Maven CircleCI 2.0 configuration file
# Check for more details
version: 2
- gh-pages
# specify the version you desire here
- image: circleci/openjdk:8-jdk
cmd: ["/bin/bash"]
# Specify service dependencies here if necessary
# CircleCI maintains a library of pre-built images
# documented at
# - image: circleci/postgres:9.4
working_directory: ~/repo
# Customize the JVM maximum heap limit
- MAVEN_OPTS: -Xmx3200m
- ES_JAVA_OPTS=-Xms3200m -Xmx3200m
- checkout
# run tests!
- run: mvn -B clean install --settings "settings.xml"
version: 2
- build
- schedule:
cron: "30 1 * * *"
only: master
- build
......@@ -7,7 +7,8 @@ stages:
stage: windows
- mvn -B clean install --settings settings.xml
- call mvn -B dependency:purge-local-repository -DactTransitively=false --settings settings.xml
- call mvn -B clean install --settings settings.xml
- Windows10
- mvn -B clean install cobertura:cobertura org.eluder.coveralls:coveralls-maven-plugin:report --settings "settings.xml"
- if [ "${TRAVIS_BRANCH}" == "master" ]; then mvn -B deploy --debug --settings "./settings.xml"; fi
......@@ -5,6 +5,7 @@
This generator takes an EMAM or EMADL model and connects it to a middleware library. If all Ports of two connected Components are marked as middleware Ports, the generator will create 2 executables that can be deployed on different machines.
All communication of these 2 Components will then be tunneled trough the specified middleware:
It also supports automatic clustering of the subcomponents to deploy on different machines.
## Other important documents
### Quickstart
......@@ -21,29 +22,109 @@ See [](
## Usage
### CLI
Maven generates the jar `embedded-montiarc-math-middleware-generator-{Version}-jar-with-dependencies.jar`
and the cli is located in `de.monticore.lang.monticar.generator.middleware.DistributedTargetGeneratorCli`.
and the cli is located in `de.monticore.lang.monticar.generator.middleware.cli.DistributedTargetGeneratorCli`.
Parameters: `${file path to config json}` OR `-r ${raw json config string}`
Schema of config json:
'modelsDir':'<path to directory with EMAM models>',
'outputDir':'<path to output directory for generated files>',
'rootModel':'<fully qualified name of the root model>',
'generators':['<identifier for first generator>', '<identifier for second generator>',...],
'emadlBackend':'<deep-learning-framework backend. Options: MXNET, CAFFE2>'
Example: [](src/test/resources/
An example config file with all clustering algorithms: [config](src/test/resources/config/parameterTest/clusterParamsAllAlgos.json)
| Name | Type | Required | Description |
| modelsDir | String | ✅ | path to directory with EMAM models |
| outputDir | String | ✅ | path to output directory for generated files |
| rootModel | String | ✅ | fully qualified name of the root model |
| generators | List | ✅ | List of generator identfiers<br> 'cpp', 'emadlcpp', 'roscpp', 'rclcpp' |
| emadlBackend | String | ❓ | deep-learning-framework backend<br> 'MXNET'(Default), 'CAFFE2' |
| writeTagFile | Bool | ❓ | Writes a .tag file with all Middleware tags into the generated code<br> Defaults to false |
| clusteringParameters | Object | ❓ | Options to cluster the component before generating<br> See below |
Clustering Parameters:
| Name | Type | Required | Description |
| numberOfClusters | int | ❓ | Number of clusters the subcomponents should be divided into<br> Overrides numberOfClusters in algorithmParameters |
| flatten | bool | ❓ | Replace all components with their subcomponents execpt when it is atomic or the flatten level is reached |
| flattenLevel | int | ❓ | Maximal level of component flattening |
| metric | String | ❓ | Metric to evaluate the quality of the resulting clusters. Available: "CommunicationCost"(Default), "Silhouette"|
| chooseBy | String | ❓ | Strategy to choose from the resulting clusterings<br> bestWithFittingN(Default): if numberOfClusters is set, all results with a different number of clusters are ignored<br> bestOverall: ignore numberOfClusters, choose result with best score |
| algorithmParameters | List<Object> | ❓ | Used to specify which algorithms(and their parameters) are used for clustering |
There are 4 different Clustering Algorithms with distinct parameters
Every parameter of the clustering algorithms can be dynamic, enabling automatic search for the best values. Available are lists and generators as seen in the example below:
Generator Options:
- Behaviour generators:
- 'cpp': EMAM2CPP
- 'emadlcpp': EMADL2CPP
- Middleware generators:
- 'roscpp': EMAM2Roscpp
Example: [](
### Defining the connection between a component and the middleware
Also see [clusterDynamic.json](src/test/resources/config/parameterTest/clusterDynamic.json) and [clusterDynamicList.json](src/test/resources/config/parameterTest/clusterDynamicList.json)
Spectral Clustering:
| Name | Type | Required | Description |
| name | String | ✅️ | must equal "SpectralClustering" |
| numberOfClusters | int | ✅️ | Number of clusters that are created<br> Overwritten by global numberOfClusters |
| l | int | ❓ | |
| sigma | double | ❓ | |
| Name | Type | Required | Description |
| name | String | ✔️ | must equal "DBScan" |
| min_pts | int | ✔️ | |
| radius | double | ✔️ | |
| Name | Type | Required | Description |
| name | String | ✔️ | must equal "Markov" |
| max_residual | double | ❓ | |
| gamma_exp | double | ❓ | |
| loop_gain | double | ❓ | |
| zero_max | double | ❓ | |
Affinity Propagation:
| Name | Type | Required | Description |
| name | String | ✔️ | must equal "AffinityPropagation" |
### Visulization of clustering results
There are 3 scripts available to visualise the results of the clustering process. They all create graphs for each of the 4 evaluation models:
1. [](src/test/resources/ bar graphs that compare the size of clusters, distance score, and time taken in ms
2. [](src/test/resources/ line graph visualising the average distance cost for random clustering(with Monte Carlo)
3. [](src/test/resources/ point graph visualising the silhouette score of different clusterings sorted by cluster size
Before using them install Python 3+ and the packages `matplotlib` and `numpy`.
After running `EvaluationTest`(Warning: very long runtime) you can visualise the results by calling(from the project root):
python3 src/test/resources/ target/evaluation/autopilot/emam/clusteringResults.json target/evaluation/pacman/emam/clusteringResults.json target/evaluation/supermario/emam/clusteringResults.json target/evaluation/daimler/emam/clusteringResults.json
python3 src/test/resources/ target/evaluation/autopilotMC/monteCarloResults.json target/evaluation/pacmanMC/monteCarloResults.json target/evaluation/supermarioMC/monteCarloResults.json target/evaluation/daimlerMC/monteCarloResults.json
python3 src/test/resources/ target/evaluation/autopilotSilhouette/emam/clusteringResults.json target/evaluation/pacmanSilhouette/emam/clusteringResults.json target/evaluation/supermarioSilhouette/emam/clusteringResults.json target/evaluation/daimlerSilhouette/emam/clusteringResults.json
## Defining the connection between a component and the middleware
The connection between middleware and the component is defined as tags on Ports in .tag files.
### Example with ROS Middleware:
Tags of the type RosConnection can either be simple tags(see Example 3) or define a topic( with name, type and optional msgField( , 2.)
EmbeddedMontiArc automated component clustering
Bundle interconnected top level components of the model into different clusters. The aim is to reduce connection and communication overhead between components by grouping affine components into different clusters which then are connected using ROS.
1) Convert the symbol table of a component into an adjacency matrix
o Order all sub components by name (neccessary for the adjacency matrix).
o Create adjacency matrix to use with a clustering algorithm, with subcomponents as nodes and connectors between subcomponents as vertices. Sift out all connectors to the super component.
2) Feed adjacency matrix into the selected clustering algorithm
o We are using the machine learning library "smile ml" (see: which provides a broad range of different clustering and partitioning approaches. As a prime example we are using "spectral clustering" here. For a closer look at this approach, see the section below.
o The clustering algorithm yields multiple cluster labels with the clustered entries of the adjacency matrix assigned to them. We have to convert them back to a set of symbol tables of components representing the clusters.
3) Generate middleware tags separating the clusters
o This will build the cluster-to-ROS connections.
o We won’t take account of ports of the super component and only consider connected top level components.
o A connection will be established if the target cluster label is different from the source cluster label thus connecting different clusters with each other.
4) Feed result into existing manual clustering architecture
Spectral Clustering in a nutshell
The goal of spectral clustering is to cluster data which is connected but not compact or not clustered within convex boundaries. Data is basically seen as a connected graph and clustering is the process of finding partitions in the graph based on the affinity (similarity or adjacency) of vertices.
The general approach is to perform dimensionality reduction before clustering in fewer dimensions using a standard clustering method (like k-means) on relevant eigenvectors (the "spectrum") of the matrix representation of a graph (Laplacian matrix).
Basically we follow three steps in spectral clustering
(1) Pre-processing
Construct a matrix representation of a graph
(2) Decomposition
* Compute eigenvalues and eigenvectors of the matrix
* Map each point to a lower-dimensional representation based on one or more eigenvectors
(3) Grouping
Assign points to two or more clusters, based on the new representation
Pre Pre-processing: How to define the affinity of data points and decide upon the connectivity of a similarity graph?
We have to define both, a way to calculate affinity (similarity function), and a respective graph representation (from which then the similarity matrix is derived). Typically a similarity function evaluates the distance, this can either be the classic Euclidian distance or a Gaussian Kernel similarity function.
The most wide spread approach to a graph representation is kNN, the k nearest neighbors of a vertex. In this approach the k nearest neighbors of a vertex v vote on where v belongs and thus should be connected to. The goal is to connect vertex vi with vertex vj if vj is among the k-nearest neighbors of vi.
Because this leads to a directed graph, we need an approach to convert it to an undirected one. This can be done in two ways: Either there's an edge if p is NN of q OR q is NN of p. Or there's an edge if p is NN of q AND q is NN of p (this is called mutual kNN, which is in practice a good choice).
Other possible approaches to decide on the connectivity of a similarity graph are "epsilon neighborhood" (a threshold based approach to construct a binary adjacency matrix) or a fully connected graph in combination with a Gaussian similarity function.
Affinity evaluation: Principal Component Analysis (PCA)
It is the affinity of data points, which defines clusters, rather than the absolute (spatial) location or spatial proximity.
Within an affinity matrix, data points belonging to the same cluster have a very similar affinity vector to all other data points (eigenvector). Each eigenvector has an eigenvalue which states how prevalent its vector is in the affinity matrix. So those eigenvectors act like a fingerprint for different clusters, representing all datapoints belonging to a specific cluster, in a lower dimensional space.
The Laplacian matrix
The Laplacian matrix L is defined as L= D-A, where D is the degree matrix (a diagonal matrix, containing the number of direct neighbors of a vertex) and A is the (binary) adjacency matrix (Aji=1 if vertecies i and j are connected with an edge, 0 otherwise).
Further sources of reading
A detailed discussion of different approaches to evaluating affinity and deeper information on spectral clustering in general can be found here:
......@@ -9,7 +9,7 @@
<!-- == PROJECT DEPENDENCIES ============================================= -->
......@@ -20,10 +20,11 @@
<!-- .. Libraries .................................................. -->
......@@ -51,6 +52,7 @@
......@@ -64,6 +66,12 @@
......@@ -75,7 +83,11 @@
......@@ -94,6 +106,13 @@
<!-- EMADL Dependencies -->
......@@ -106,6 +125,12 @@
<!-- MontiCore Dependencies -->
......@@ -135,7 +160,6 @@
<!-- == PROJECT BUILD SETTINGS =========================================== -->
......@@ -169,7 +193,7 @@
......@@ -26,6 +26,11 @@
......@@ -57,6 +62,12 @@
<releases><enabled /></releases>
<snapshots><enabled /></snapshots>
<releases><enabled /></releases>
<snapshots><enabled /></snapshots>
package de.monticore.lang.monticar.generator.middleware;
import de.monticore.lang.embeddedmontiarc.embeddedmontiarc._symboltable.instanceStructure.EMAComponentInstanceSymbol;
import de.monticore.lang.monticar.clustering.AutomaticClusteringHelper;
import de.monticore.lang.monticar.clustering.ClusteringResult;
import de.monticore.lang.monticar.clustering.ClusteringResultList;
import de.monticore.lang.monticar.clustering.FlattenArchitecture;
import de.monticore.lang.monticar.generator.FileContent;
import de.monticore.lang.monticar.generator.middleware.cli.ClusteringParameters;
import de.monticore.lang.monticar.generator.middleware.cli.ResultChoosingStrategy;
import de.monticore.lang.monticar.generator.middleware.compile.CompilationGenerator;
import de.monticore.lang.monticar.generator.middleware.helpers.*;
import de.monticore.lang.monticar.generator.middleware.helpers.ClusterFromTagsHelper;
import de.monticore.lang.monticar.generator.middleware.helpers.FileHelper;
import de.monticore.lang.monticar.generator.middleware.helpers.NameHelper;
import de.monticore.lang.monticar.generator.middleware.helpers.RosHelper;
import de.monticore.lang.monticar.generator.middleware.impls.GeneratorImpl;
import de.monticore.lang.monticar.generator.middleware.impls.MiddlewareTagGenImpl;
import de.monticore.lang.monticar.generator.middleware.impls.RclCppGenImpl;
import de.monticore.lang.monticar.generator.middleware.impls.RosCppGenImpl;
import de.monticore.lang.tagging._symboltable.TaggingResolver;
......@@ -15,12 +25,32 @@ import;
import java.util.*;
public class DistributedTargetGenerator extends CMakeGenerator {
private boolean generateMiddlewareTags = false;
private ClusteringResultList clusteringResults = new ClusteringResultList();
public boolean isGenerateMiddlewareTags() {
return generateMiddlewareTags;
public void setGenerateMiddlewareTags(boolean generateMiddlewareTags) {
this.generateMiddlewareTags = generateMiddlewareTags;
private Set<String> subDirs = new HashSet<>();
private ClusteringParameters clusteringParameters;
public DistributedTargetGenerator() {
public ClusteringParameters getClusteringParameters() {
return clusteringParameters;
public void setClusteringParameters(ClusteringParameters clusteringParameters) {
this.clusteringParameters = clusteringParameters;
public void setGenerationTargetPath(String path) {
String res = path;
......@@ -34,12 +64,12 @@ public class DistributedTargetGenerator extends CMakeGenerator {
public List<File> generate(EMAComponentInstanceSymbol componentInstanceSymbol, TaggingResolver taggingResolver) throws IOException {
public List<File> generate(EMAComponentInstanceSymbol genComp, TaggingResolver taggingResolver) throws IOException {
Map<EMAComponentInstanceSymbol, GeneratorImpl> generatorMap = new HashMap<>();
EMAComponentInstanceSymbol componentInstanceSymbol = preprocessing(genComp);
List<EMAComponentInstanceSymbol> clusterSubcomponents = ClusterHelper.getClusterSubcomponents(componentInstanceSymbol);
List<EMAComponentInstanceSymbol> clusterSubcomponents = ClusterFromTagsHelper.getClusterSubcomponents(componentInstanceSymbol);
if (clusterSubcomponents.size() > 0) {
clusterSubcomponents.forEach(clusterECIS -> {
String nameTargetLanguage = NameHelper.getNameTargetLanguage(clusterECIS.getFullName());
......@@ -58,11 +88,61 @@ public class DistributedTargetGenerator extends CMakeGenerator {
MiddlewareTagGenImpl middlewareTagGen = new MiddlewareTagGenImpl();
middlewareTagGen.setGenerationTargetPath(generationTargetPath + "emam/");
return files;
private EMAComponentInstanceSymbol preprocessing(EMAComponentInstanceSymbol genComp) {
EMAComponentInstanceSymbol componentInstanceSymbol = genComp;
if(clusteringParameters != null){
Integer level = clusteringParameters.getFlattenLevel().get();
componentInstanceSymbol = FlattenArchitecture.flattenArchitecture(genComp, new HashMap<>(), level);
}else {
componentInstanceSymbol = FlattenArchitecture.flattenArchitecture(genComp);
System.out.println("Subcomponents after flatten: " + componentInstanceSymbol.getSubComponents().size());
if(clusteringParameters.getAlgorithmParameters().size() > 0) {
clusteringResults = ClusteringResultList.fromParametersList(componentInstanceSymbol, clusteringParameters.getAlgorithmParameters(), clusteringParameters.getMetric());
Optional<Integer> nOpt = clusteringParameters.getNumberOfClusters();
for(ClusteringResult c : clusteringResults){
String prefix = nOpt.isPresent() && !c.hasNumberOfClusters(nOpt.get()) ? "[IGNORED]" : "";
c.saveAsJson(generationTargetPath +"emam/", "clusteringResults.json");
System.out.println(prefix + "Score was " + c.getScore() + " for " + c.getParameters().toString());
Optional<ClusteringResult> clusteringOpt;
if(nOpt.isPresent() && clusteringParameters.getChooseBy().equals(ResultChoosingStrategy.bestWithFittingN)){
clusteringOpt = clusteringResults.getBestResultWithFittingN(nOpt.get());
clusteringOpt = clusteringResults.getBestResultOverall();
ClusteringResult clusteringResult = clusteringOpt.get();
System.out.println("Best score was " + clusteringResult.getScore() + " for " + clusteringResult.getParameters().toString());
AutomaticClusteringHelper.annotateComponentWithRosTagsForClusters(componentInstanceSymbol, clusteringResult.getClustering());
return componentInstanceSymbol;
private GeneratorImpl createFullGenerator(String subdir) {
MiddlewareGenerator res = new MiddlewareGenerator();
res.setGenerationTargetPath(generationTargetPath + "src/" + (subdir.endsWith("/") ? subdir : subdir + "/"));
......@@ -82,7 +162,6 @@ public class DistributedTargetGenerator extends CMakeGenerator {
StringBuilder content = new StringBuilder();
content.append("cmake_minimum_required(VERSION 3.5)\n");
//TODO setProjectName?
content.append("project (default)\n");
content.append("set (CMAKE_CXX_STANDARD 11)\n");
package de.monticore.lang.monticar.generator.middleware;
package de.monticore.lang.monticar.generator.middleware.cli;
import java.util.Optional;
import java.util.Set;
public class CliParameters {
private static final boolean DEFAULT_WRITE_TAG_FILE = false;
private static final String DEFAULT_EMADL_BACKEND = "MXNET";
private String modelsDir;
private String outputDir;
private String rootModel;
private Set<String> generators;
private String emadlBackend;
private Boolean writeTagFile;
private ClusteringParameters clusteringParameters;
public CliParameters() {
public CliParameters(String modelsDir, String outputDir, String rootModel, Set<String> generators) {
this(modelsDir, outputDir, rootModel, generators, "MXNET");
public CliParameters(String modelsDir, String outputDir, String rootModel, Set<String> generators, String emadlBackend) {
public CliParameters(String modelsDir, String outputDir, String rootModel, Set<String> generators, String emadlBackend, Boolean writeTagFile, ClusteringParameters clusteringParameters) {
this.modelsDir = modelsDir;
this.outputDir = outputDir;
this.rootModel = rootModel;
this.generators = generators;
this.emadlBackend = emadlBackend;
this.writeTagFile = writeTagFile;
this.clusteringParameters = clusteringParameters;
public String getModelsDir() {
......@@ -41,6 +45,15 @@ public class CliParameters {
public String getEmadlBackend() {
return emadlBackend;
return emadlBackend == null ? DEFAULT_EMADL_BACKEND : emadlBackend;