Python. Pandas. Web Scraping. Databases. SQL. Machine Learning. APIs.
All applied to Baseball
Learning to code isn't hard, you just need to stick with it a bit. That's why the most important thing is starting with a project you're excited about.
This book will take you from playing around with stats in Excel to scraping websites, building databases and running your own machine learning models.
“This book was really, really well done.”
You'll learn — step by step and applied to baseball — how to program your own analysis. You'll also learn how to make plots like these 👇:
“Amazingly awesome... the way the learning is framed here is 10x what you'll get someplace else.”
About this and the related football, basketball, soccer and hockey versions —
“I was amazed by how you broke down complicated concepts and made them easier to understand.”
“I've taken automate the boring stuff, python for finance, etc and while those course are great... I seem to be understanding it better because its about a subject I like.”
No! Many people have gone through it with zero coding experience and done just fine.
That said, it does move fast and build on itself, so if you're new you might just have to take it slower and make sure you understand each section before continuing. It includes end-of-chapter problems and exercises that you can use to do that, so it's not hard.
We'll learn Python, which is a free, open-source program. Detailed installation instructions are included.
The book includes some optional spaced repetition flashcards to help you remember what you've learned. The official iPhone app to use these costs $25 (it's free on Android). It's worth it IMO, but I describe workarounds if you don't want to pay this.
Yes! Go here. You'll be able to enter the recipient's name, email, and when you want the book sent (stroke of midnight, Christmas eve).
Yes, I've heard of kids as young as 12 working through and liking the book. It doesn't require any prior knowledge, and I explain even relatively simple concepts like "data". That said, it does builds on itself and moves pretty quickly. In general, if you have a smart kid who is into sports, it's not only doable, but fun.
Besides baseball, I also have football, basketball, soccer and hockey versions.
They all teach the same, general purpose data and analysis concepts. For most people, reading just one will be fine. That said, there are some differences, particularly around the where to get data and API chapters.
Additional sports are 50% off. You can find more info and purchase multiple books here.
Probably! Many people have.
Although you'll learn all these concepts (Python, SQL, data manipulation, visualization, and modeling) via baseball, you'll 100% be able to apply these concepts to other areas, including your day job.
This is exactly what I did. I taught myself to code by playing around with fantasy football stats on nights and weekends. Then I used that — without going to school or a bootcamp — to get multiple data science jobs in non-sports fields.
Company/multi license discounts are available too. Email me and I'm happy to help.
At the moment the book is only available in an electronic format. This is primarily for two reasons:
That said, I might make a physical version someday. And I have had some readers take it to a print shop and have them print it out and bind it.
See the prerequisites section of the book. But they're at:
Yes! The book includes lifetime updates. If I update the book — whether it's to fix a typo, make a section clearer or because something changed with one of the libraries — I upload the newest version to SendOwl, and reset everyone's number of downloads.
If it's a significant change (e.g. a library has changed or I fixed something that was broken) I'll send an email about it. If it's just a typo, I usually don't in order to avoid clogging email inboxes.
You can follow along with all the changes on GitHub. If you bought the book a while ago and are picking it up, it's a good a idea grab the newest version.
The site I use to everything send everything out (SendOwl) password protects it automatically. I've asked them about it but they said it's random and not even they know it.
Me + some readers have figured out a way around it, it just involves some manual work — if it's a problem email me and I'm happy to help.
“I wouldn't be where I'm at with the Python language today without this to book to kick start things.”
See the full table of contents
Python — This flexible language is the foundation of everything from data munging to web scraping to machine learning. You'll also learn about its key data library Pandas, the modeling and machine learning libraries statsmodels and scikit-learn, and how to do data visualizations with seaborn.
Web Scraping and APIs — Next time you run across a site with data you'd like to analyze you'll know how to grab data via its public API if it's available, or build a web scraper to get it yourself if it's not.
Machine Learning and Statistics — You'll learn the difference between a regression and a random forest, and will know when and how to build both.
Databases and SQL — Build your own database — whether it's for player statistics, to keep track of opponent tenancies, etc — and use SQL to get data in and out of it.
All in the context of baseball and designed so you can learn how to apply them to your own questions and do your own analysis.
Hi! My name is Nate and I'm a self-taught programmer and data scientist based in Milwaukee, WI.
A few years ago, I didn't know anything about Python, SQL, machine learning, web scraping or any of the other topics covered here.
So, I taught myself. It took a few years and I ran into a lot of dead ends along the way, but ultimately I figured it out. In this book, I distill everything I've learned to provide a step-by-step guide to doing baseball analytics and get you up and running as quickly as possible.