README.md



Studying the Effect of Data in Commit-Based Static Analysis

Introduction
The importance of software security has been emphasized as various industries have become increasingly dependent on
software.
One of the major challenges in ensuring software security is identifying and addressing vulnerabilities in the
code.
If left undetected, vulnerabilities can result in significant harm.
Static analysis remains a prevalent tool in
detecting vulnerabilities in software.
This method can analyze code without executing it, allowing for early detection
of vulnerabilities in the development process.
In this project, we will explore the use of commit-based static analysis,
which focuses on individual code changes, to improve the accuracy and effectiveness of vulnerability detection.
We will
evaluate the performance of various machine learning models trained on vulnerable code commits to identify
vulnerabilities.
The goal of this research is to enhance commit-based static analysis and improve software security by
developing more effective methods for identifying and addressing vulnerabilities in software.

Contents


Data/: contains the datasets used in the project and the scripts to process them.

Docker/: contains the docker containers for easy setup of the project.

README.md: this file provides an overview of the repository.


Dataset
The dataset can be found here.

Dependencies

Python 3.11.2
Docker >= 20.10.5


Usage
Make sure to install the dependencies before running the project.

Setup with Docker

Run docker compose up -d mongo redis mysql in the root directory of the project to start the databases.
Run docker compose up -d mongo_python to start populating the CVE-Search database.
Run docker compose up -d --build commit_analysis to start the main container.


NOTE: Check the README.md in the Docker/ directory for more information!


Rebuilding the commit_analysis container
The current code is copied to the commit_analysis container.
If any changes are made in the init_scripts, the container needs to be rebuilt.
This can be done by running docker compose up -d --build commit_analysis.

Use of the pre-populated database
The databases expose their ports to the host machine.
To get the ports, check the docker-compose.yml file.
You can use those ports to connect to the databases and pre-populate them with the data provided.
Then you do not need to run all mappings.

License
Distributed under the GPLv3 License. See LICENSE for more information.

Authors

Rawel Ahmad - rawel.ahmad@stud.tu-darmstadt.de

Nikolaos Alexopoulos - coordination <GitHub>


Acknowledgements
The following individuals have contributed to this project:


BIC-Tracker by JayJayJay1: The crawler
for syzkaller crash reports was adapted from this project.
(See Data/DatasetSources/syzkaller/)

What Happens When We Fuzz? Investigating OSS-Fuzz Bug History by Keller et al.:
The
extracting method for the OSS-Fuzz dataset was adapted from this paper.
(See Data/DatasetSources/oss-fuzz/processor.py)

VulnerabilityLifetimes
by manuelbrack: The heuristic was adapted from this project.
(See Data/Heuristic/)


Thesis information
This project is part of a bachelor's thesis at Technische Universität Darmstadt.