O'Reilly Book

Soccer Analytics with Machine Learning

A practical guide to learning machine learning through real soccer data.

Available now at

From the Book

Preface

A note to the reader

Dear reader,

Soccer has always been a game driven by passion, intuition, and endless debate. For decades, fans have loved it not only for its results, but for the stories, emotions, memories, and uncertainty that come with every match.

In recent years, that traditional way of enjoying the game has been joined by something new: data. Advances in artificial intelligence, machine learning, modern computing, and the growing availability of high-quality sports data have made rigorous analysis accessible far beyond professional clubs and research labs.

What excites us most is not only the technology itself, but what it enables: clearer thinking, more disciplined reasoning, and a deeper appreciation of soccer's complexity. Data-driven methods do not replace passion or intuition. They sharpen them.

We wrote this book for soccer fans, students, educators, and professionals who want to learn machine learning through real, meaningful problems. It is meant to be practical, honest about limitations, and welcoming to readers who are curious enough to work with code and data.

Its publication in 2026, just ahead of the FIFA World Cup, feels especially meaningful. At a moment when global attention turns toward matches, tactics, and prediction, we hope this book offers a way to engage with the game more deeply.

With curiosity and appreciation for the game,

The Authors

  • What you'll gain from this book

    Learn how to turn soccer data into insights, predictions, and practical machine learning workflows.

  • Python Foundations

    Build the programming and data analysis skills needed to work confidently with soccer analytics projects.

    Chapter 2
  • Exploratory Analysis

    Understand match, event, and player data through visualization, summary statistics, and soccer-specific examples.

    Chapter 3
  • Predictive Modeling

    Learn classification and regression techniques for match outcomes, scoring, and performance analysis.

    Chapters 4-6
  • Deep Learning

    Move from intuition to implementation with neural networks and modern modeling workflows for soccer problems.

    Chapter 7
  • Feature Engineering

    Discover how domain knowledge and thoughtful data representation can unlock stronger model performance.

    Chapter 8
  • Decision Making

    Connect predictions to betting, optimization, evaluation, and the future direction of soccer analytics.

    Chapters 9-10
  • Chapters

    What the book covers

    • Chapter 01

      The Soccer Analytics Landscape

      An introduction to the field, the data ecosystem, and why soccer is such a compelling domain for analytics.

    • Chapter 02

      Python Fundamentals

      Core Python skills for analysis, data wrangling, and reproducible workflows.

    • Chapter 03

      Exploratory Data Analysis in Soccer

      Practical tools for understanding event data, team behavior, and player tendencies.

    • Chapter 04

      Predicting Soccer Outcomes

      Classification methods for forecasting match outcomes and related soccer events.

    • Chapter 05

      Advanced Classification Methods

      More powerful models and evaluation strategies for applied soccer prediction tasks.

    • Chapter 06

      Regression Techniques

      Methods for modeling goals, performance metrics, and continuous outcomes in soccer analytics.

    • Chapter 07

      Deep Learning for Soccer Analytics

      Neural network concepts, implementation ideas, and practical challenges in real applications.

    • Chapter 08

      Feature Engineering for Soccer Analytics

      How to create better signals from raw soccer data using context and domain expertise.

    • Chapter 09

      From Predictions to Potential Profit

      A bridge from predictive models to evaluation, optimization, and responsible betting strategy.

    Authors

    Meet the team behind the book

    Explore the backgrounds of the authors shaping the book's mix of machine learning, statistics, and soccer analytics.

    • Haipeng Gao, PhD

      Machine Learning Engineer

      Haipeng Gao, PhD

      Machine Learning Engineer

      Haipeng Gao is a data science and machine learning expert with extensive industry experience developing and optimizing machine learning solutions for large-scale, business-critical systems at PayPal, TikTok, and LinkedIn. He is a named inventor on multiple AI/ML patents granted by the U.S. Patent and Trademark Office.

      He has taught statistics and probability at the University of North Carolina at Chapel Hill and San Jose State University, and holds a PhD in Statistics and Operations Research from UNC Chapel Hill.

    • Ari Joury, PhD

      Entrepreneur, Data Scientist

      Ari Joury, PhD

      Entrepreneur, Data Scientist

      Ari Joury is a data scientist and entrepreneur working at the intersection of machine learning, causal inference, and applied analytics. He is Founder and CEO of Wangari Global, where he builds AI systems for decision support in finance, insurance, and sustainability.

      With a focus on interpretable models and real-world impact, Ari brings a background that spans theoretical particle physics, applied data science, and quantitative modeling. He holds a PhD in Physics and an MBA.

    • Weining Shen, PhD

      Associate Professor

      Weining Shen, PhD

      Associate Professor

      Weining Shen is an associate professor of statistics at the University of California, Irvine. His research spans machine learning, Bayesian statistics, sports analytics, and large language models.

      He has published more than 70 papers in leading statistics journals and premier machine learning conferences, and earned his PhD in Statistics from North Carolina State University.

    • Guanyu Hu, PhD

      Associate Professor

      Guanyu Hu, PhD

      Associate Professor

      Guanyu Hu is an associate professor at Michigan State University whose research focuses on spatial statistics, Bayesian nonparametric methods, and sports analytics. He has led multiple NSF-funded projects and published more than 50 papers in leading statistics journals and machine learning conferences.

      He serves as an associate editor for several prominent journals, chaired the American Statistical Association's Statistics in Sports Section in 2024, helps organize the American Soccer Insights Summit, and earned his PhD in Statistics from Florida State University.

    Get The Book

    Where to read and explore more

    • Retail

      Buy
      • Choose from major retailers in one place

      • Print availability and easy direct purchase

      • Good for sharing with readers, teams, and classes

    • Online

      Read
      • Read through O'Reilly Learning

      • Ideal for online access

      • Useful for teams and subscribers

      • Direct publisher platform

    • Resources

      Code
      • Official Jupyter notebooks

      • Supplementary materials

      • Chapter-by-chapter examples

      • Setup instructions included

    Why This Book

    What makes it worth reading

    • trophy

      Practical

      Built around real soccer analytics problems instead of abstract machine learning exercises.

    • trophy

      Comprehensive

      Covers Python, exploratory analysis, classification, regression, deep learning, and feature engineering.

    • trophy

      Applied

      Connects predictive models to decisions, evaluation, optimization, and real-world reasoning.

    • trophy

      Accessible

      Welcomes readers with basic programming familiarity, even without prior sports analytics experience.

    • trophy

      Supported

      Includes companion notebooks and supplementary materials so readers can follow along in code.

    • Timely

      Arrives as the 2026 tournament cycle brings fresh attention to soccer data, forecasting, and performance analysis.

    Newsletter

    Join the mailing list