A Python project to train and predict soccer match outcomes using historical data and a Random Forest classifier.
- Calculates team-level features (goals scored/conceded averages, form, points) over the last 5 matches
- Supports training on historical CSV datasets
- Uses
RandomForestClassifierfor prediction - Simple API:
train(...)to build the modelpredict_match(home, away)to get win/draw/lose probabilities
- Python 3.7+
pandasnumpyscikit-learn
Install required packages:
pip install pandas numpy scikit-learn- Clone this repository:
git clone https://github.com/your-username/FootballMatchPredictor.git
cd FootballMatchPredictor- (Optional) Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # on macOS/Linux
venv\Scripts\activate # on Windows- Install dependencies:
pip install -r requirements.txt- Prepare a dataset CSV with columns:
Date,HomeTeam,AwayTeam,FTHG,FTAG
- Update path in code or pass DataFrame/CSV path to
train().
from football_predictor import FootballMatchPredictor
# Initialize for Premier League
predictor = FootballMatchPredictor('Premier League')
# Train on 2024-25 season data
predictor.train("Datasets/season-2425.csv")
# Predict a fixture
result = predictor.predict_match("Arsenal", "Liverpool")
print(result)
# Output: {'home_win': 0.65, 'draw': 0.20, 'away_win': 0.15}.
├── src/
│ └── football_predictor.py # FootballMatchPredictor class
├── Datasets/
│ └── season-2425.csv # Sample training data
├── requirements.txt # Python dependencies
└── README.md # This file
- Feature window: currently uses last 5 matches for form/goals averages
- Model: Random Forest with 100 trees; adjust hyperparameters in
__init__
Contributions are welcome! Please open issues or pull requests to:
- Improve feature engineering
- Add support for other leagues
- Experiment with different models
This project is licensed under the MIT License – see LICENSE for details.