A simulator developed for stream processing as part of the NTUA thesis.
- Introduction
- Features
- Installation
- Running the simulator
- Example Usage
- Configuration
- Contributing
- License
- Contact Information
The Stream Processing Simulator is a tool designed to simulate stream processing systems for research and educational purposes. Developed as part of a thesis project at the National Technical University of Athens (NTUA), it aims to provide insights into the performance and behavior of stream processing architectures.
- Simulate various stream processing scenarios
- Modular and extensible architecture
- Support for different data scenarios
- Support for custom operators and processing elements
- Fully configurable topology
- Performance metrics
- Scalability testing
- Partition Strategy testing
- Python 3.8 or higher
- Required Python packages listed in
requirements.txt
-
Clone the repository
git clone https://github.com/emsquared2/Stream-Processing-Simulator-NTUA-Thesis.git
-
Navigate to the project directory
cd Stream-Processing-Simulator-NTUA-Thesis
-
Install the required packages:
pip install -r requirements.txt
To run a simulation, use the following command:
python main.py --config <path/to/config.json> [--key_gen <path/to/generated_key.json>] [--stream <path/to/pre-existing_key.json>] [--logs <path/to/logs_directory>]
--config CONFIG
: Path to the configuration file. (Required)--key_gen KEY_GEN
: Path to the generated key stream file. (Mutually required with--stream
. Either--key_gen
or--stream
must be provided.)--stream STREAM
: Path to the pre-existing key stream file. (Mutually required with--key_gen
. Either--key_gen
or--stream
must be provided.)--logs LOGS
: Path to the directory for storing generated logs. (Optional)
An example to run the simulator with a given configuration file:
python main.py --config config/example_config.json --key_gen input/stream.txt --logs logs/
The configuration file is a JSON file that defines the topology of the stream processing system. Below is an example configuration:
{
"topology": {
"stages": [
{
"id": 0,
"type": "stateless",
"nodes": [
{
"id": 0,
"type": "key_partitioner",
"throughput": 1000,
"operation_type": "StatelessOperation",
"strategy": {
"name": "hashing"
}
}
]
},
{
"id": 1,
"type": "stateful",
"nodes": [
{
"id": 1,
"type": "stateful",
"throughput": 1000,
"operation_type": "Sorting",
"window_size": 5,
"slide": 2
},
{
"id": 2,
"type": "stateful",
"throughput": 1000,
"operation_type": "Sorting",
"window_size": 5,
"slide": 2
}
]
},
{
"id": 2,
"type": "stateless",
"nodes": [
{
"id": 3,
"type": "key_partitioner",
"throughput": 1000,
"operation_type": "StatelessOperation",
"strategy": {
"name": "hashing"
}
},
{
"id": 4,
"type": "key_partitioner",
"throughput": 1000,
"operation_type": "StatelessOperation",
"strategy": {
"name": "hashing"
}
}
]
},
{
"id": 3,
"type": "stateful",
"nodes": [
{
"id": 5,
"type": "stateful",
"throughput": 1000,
"operation_type": "Aggregation",
"window_size": 5,
"slide": 2
}
]
}
]
}
}
The configuration file describes the system topology with multiple stages and nodes, specifying node types, throughput, operations, partitioning strategies, and window configurations.
- Stages: Each stage contains one or more nodes of the same type.
- Nodes: Nodes can be either stateless or stateful, and each has a specific role such as key partitioning, worker node (computational node and aggregator node.
- Operation: The operation each worker node is implementing.
- Partition Strategies: Strategies like hashing can be used to partition keys across nodes.
Contributions are welcome! Here are some ways you can contribute:
- Report Issues: If you find any bugs or have suggestions for improvements, please open an issue.
- Submit Pull Requests: If you want to contribute code, fork the repository, create a new branch, make your changes, and submit a pull request.
- Documentation: Help improve the documentation by adding more examples, clarifying existing sections, or translating content.
- Feature Requests: Suggest new features that can make the simulator more useful.
Before submitting a pull request, please ensure that:
- Your code follows the existing style and conventions.
- You have tested your changes thoroughly.
- You include a description of the changes made and the purpose of the modification.
Feel free to reach out if you have questions or need help getting started.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or inquiries, please contact:
Name | GitHub | |
---|---|---|
Alexandros Ionitsa | [email protected] | alexion |
Emmanouil Emmanouilidis | [email protected] | emsquared2 |
Nikolaos Chalvantzis | [email protected] | nchalv |