Extracting and Analyzing Object-Centric Game Data
This repository contains the full code base for a bachelor's thesis.
It is focused on extracting, transforming, and analyzing object-centric event data from StarCraft II replay files, and converting it into the OCEL 2.0 (Object-Centric Event Log) format for process mining and data science.
✨ Goal
The goal of this thesis is to bridge video game telemetry and process mining by transforming rich in-game interactions (units, players, abilities, locations) into structured, object-aware event logs.
These logs can be used with tools like PM4PY to perform advanced analysis of game strategies, behaviors, and object lifecycles.
📁 Repository Structure
.
├── src/
│ ├── Pipeline/
│ │ ├── raw2structured.py # Extracts and groups structured events from SC2Replay files
│ │ ├── structured2file.py # Writes structured events to JSON
│ │ ├── json2sql.py # Converts grouped JSON to OCEL 2.0-compliant SQLite
│ │ ├── constants.py # Paths to replays/output folders
│ │ └── __init__.py # Entry point for end-to-end pipeline
│ └── analysis/ # (optional) downstream analysis and visualization
├── data/
│ ├── replays/ # Raw .SC2Replay files
│ └── output/ # JSON and SQLite OCEL outputs
└── README.md
🧪 Technologies Used
- sc2reader: to parse .SC2Replay files
-
Python 3.11 and standard libraries (e.g.,
sqlite3
,json
,collections
) - OCEL 2.0 Schema: relational model for object-centric event logs
- PM4PY: for OCEL importing and process mining analysis
🚀 Pipeline Overview
raw2structured.py
)
1. Extract Events from Replays (- Parses every event in a StarCraft II replay
- Converts it into a flat Python dictionary
- Adds metadata: timestamps, player, unit info, location, etc.
structured2file.py
)
2. Write Structured Events to JSON (- Processes all replays in a folder
- Saves each grouped event log to
*_events.json
json2sql.py
)
3. Convert to OCEL SQLite (-
Transforms grouped JSON into an OCEL 2.0-compliant relational database
-
Creates:
-
event
,object
,event_object
,object_object
tables - Event-specific and object-type-specific tables (
event_UnitBornEvent
,object_Unit
, etc.)
-
-
Tracks metadata such as:
- Unit type, owner, position
- Player identity
- Control group relationships
- Complex inter-object links: attacks, kills, casts, gathering, healing
🤔 What Makes This Pipeline Object-Centric?
- Multiple object types: Player, Unit, ControlGroup, Location, etc.
- Multi-object relations: each event can involve several objects
- Lifecycles & interactions: Unit creation, movement, killing, casting, gathering, etc.
- Richer semantics: not just activities, but structured interactions
⚙️ How to Use
- Place
.SC2Replay
files into thedata/replays/
directory. - Run the structured extraction:
python src/Pipeline/structured2file.py
- Convert JSON to OCEL SQLite:
python src/Pipeline/json2sql.py # Or call convert_to_ocel_sqlite from __init__.py
- Analyze using PM4PY or other tools:
from pm4py.objects.ocel.importer.sqlite import factory as sqlite_importer
ocel = sqlite_importer.apply('data/output/example.sqlite')
🔹 Example Use Cases
- Compare strategies of different players across replays
- Mine object lifecycles (e.g., when are Zerglings most commonly killed?)
- Detect behavior patterns across units (e.g., movement -> cast -> attack)
- Train predictive models using OCEL data
📝 Thesis Context
This repository supports a bachelor's thesis on object-centric process mining from video game data, using StarCraft II as a case study. The approach is guided by academic sources such as:
- A Framework for Extracting Real-World Object-Centric Event Logs from Game Data
- Object-centric process mining dealing with divergence and convergence in event data
🚫 Limitations & Warnings
- PM4PY may raise warnings if foreign keys or OCEL constraints are not perfectly satisfied. Some are harmless, but others may indicate schema issues.
- Some metadata may be missing for destroyed units or legacy replays.
-
.xes
export is not supported yet.
✏️ Future Work
- Add XOCEL or XES export
- Integrate lifecycle transitions (e.g., created, moved, destroyed)
- Add process visualizations
- Explore transformer-based models on OCEL event logs
🎓 Author
Fabian Gries B.Sc. in Computer Science RWTH Aachen University
For questions or academic references, feel free to open an issue or reach out.
❤️ Acknowledgements
- Lukas Liß (Supervisor)
- Prof. Dr. Wil van der Aalst (Object-Centric Process Mining)
- sc2reader developers
- OCEL standardization community
- PM4PY project