I'm a data science professional and open-source developer with expertise in healthcare, economics and statistical inference. I have experience with data science projects in sectors such as e-commerce, education & healthcare. 👛📚🩺 I have a quantitative background, having studied MPhil Population Health Sciences (Health Data Science stream) at the University of Cambridge and BSc Economics and the LSE. 🎓
I'm an evangelist for data - I like speaking about data (see talks below), writing about data projects for and sharing resources about data! ⚗
I have experience working in public companies, start-ups and consultancies. I tackle projects from multiple perspectives, enabled by the breadth of my experience: I've worked on projects for the public, private & third sectors; developed proprietary and open-source software; experience of traditional education & alternative education; experience of full-time employment & contracting.
These are data science packages and apps I've developed:
appelpy
: Python package for easier regression modellingobsidiantools
: Python package for analysing Obsidian.md knowledge vaults. I gave talks at PyCon UK and Portugal in 2022 about the package. I also developed NLP solutions that automated some of my knowledge management workflows, which were applied to all my MPhil study notes (150k+ word corpus).
- Statistical inference: e.g. A/B testing and experimentation; causal inference
- Machine learning
- Product analytics on variety of data: e.g. user journey optimisation; B2B data; advertising data; textual data
- Business intelligence: developing analytics strategy at enterprise level and supporting analysts' skills development
- Healthcare: e.g. epidemiology, genetics, genomics, public health
- Economics and econometrics
- Python: analytics & statistics packages (e.g. Pandas, Statsmodels, PyMC3), applied machine learning, package development
- R, including R Shiny
- Comfortable with the major OSs: my main commercial experience with Mac, but I use Linux and Windows for personal projects
- Business intelligence: Looker; Tableau
- Product analytics: Mixpanel
- Databases: primarily BigQuery (standard SQL)
- Reverse ETL and marketing data activation: Hightouch
- App deployment: e.g. Heroku
- Statistical software: Stata; SPSS
- Jupyter and JupyterLab, on local and virtual machines
- Tech for reproducible data science: e.g. Binder; GNU Make
I've worked in agile teams, with philosophies such as Extreme Programming and Test-driven Development (TDD).
Also have exposure to: Airflow; Docker; MLflow; Kubernetes; Terraform
Here is content for my data science talks:
- PyCon 2022 talks :: Connecting those thoughts: Personal knowledge management with Python
- PyData London Meetup (March 2020 - postponed) :: Publishing Your First Project on PyPI
- PyData London 2019 :: On the Path to Causal Inference