Learning the Basics of Pandas in Python Using 2024 MLB Team Batting Data
Introduction
This SCORE module will be an introduction to the basics of the Pandas library in Python. We will go over some of the basic items needed to use Python and Pandas throughout this module.
Some of the Topics we will be discussing:
- Reading and Importing Data
- Creating a Dataframe
- Data Selection
- Dataframe Analysis
- Data Manipulation
Learning Goals
- Use Pandas for importing data from CSV and Excel Files
- Creation your own Dataframe using Python Dictionaries
- Data selection of rows and columns with loc and iloc
- Data Filtering with conditionals
- How to calculate key statistics
- Handling null data
- Changing data types
Data
This Dataset is Originally from Baseball Reference and has been converted to CSV and Excel files for this learning module.
The data set contains 32 rows and 29 columns. Each row represents a MLB team.
- 2024 MLB team batting data CSV
- 2024 MLB team batting data Excel file
- 2024 MLB team pitching data CSV
Variable Descriptions
| Variable | Description |
|---|---|
| Tm | Team |
| #Bat | Number of Players used in Games |
| BatAge | Batters’ average age. Weighted by AB + Games Played |
| R/G | Runs Scored Per Game |
| G | Games Played or Pitched |
| PA | Plate Appearances |
| AB | At Bats |
| R | Runs Scored/Allowed |
| H | Hits/Hits Allowed |
| 2B | Second Base Hits |
| 3B | Third Base Hits |
| HR | Home Runs Hit/Allowed |
| RBI | Runs Batted In |
| SB | Stolen Bases |
| CS | Caught Stealing |
| BB | Bases on Balls/Walks |
| SO | Strikeouts |
| BA | Hits/At Bats |
| OBP | On-Base Percentage |
| SLG | Slugging Percentage |
| OPS | On-Base Plus Slugging |
| OPS+ | Adjusted OPS |
| TB | Total Bases |
| GDP | Double Plays Grounded Into |
| HBP | Times Hit by a Pitch |
| SH | Sacrifice Hits |
| SF | Sacrifice Flies |
| IBB | Intentional Bases on Balls |
| LOB | Runners Left On Base |
Module Files
The full module materials are linked below.
Module
This shell page provides access to the module materials. Download the student-facing Jupyter notebook and the supporting data files above to complete the activity.
To run the notebook, users will need a Python environment with packages such as pandas, numpy, and openpyxl.
Summary
In this module, students use 2024 MLB team batting data to learn basic Pandas workflows in Python, including importing data, creating dataframes, selecting rows and columns, filtering data, computing summary statistics, handling missing values, changing data types, and converting dataframe columns to NumPy arrays.