Learning the Basics of Pandas in Python Using 2024 MLB Team Batting Data

Dataframes
Summary Statistics
Importing and Reading Data
Data Filtering
Pandas
Using 2024 MLB team batting data to introduce the Pandas library in Python
Author
Affiliation

Austin Hayes

University of North Carolina at Charlotte

Published

June 2, 2025

NoteFacilitation notes

This module was originally developed as a Jupyter notebook. The SCORE preprint server does not run the notebook directly, so users should download the materials below and run the notebook locally or in a Jupyter-compatible environment.

Students should be provided with the following files:

Additional instructor materials are available below:

Introduction

This SCORE module will be an introduction to the basics of the Pandas library in Python. We will go over some of the basic items needed to use Python and Pandas throughout this module.

Some of the Topics we will be discussing:

  • Reading and Importing Data
  • Creating a Dataframe
  • Data Selection
  • Dataframe Analysis
  • Data Manipulation

Learning Goals

  • Use Pandas for importing data from CSV and Excel Files
  • Creation your own Dataframe using Python Dictionaries
  • Data selection of rows and columns with loc and iloc
  • Data Filtering with conditionals
  • How to calculate key statistics
  • Handling null data
  • Changing data types

Data

This Dataset is Originally from Baseball Reference and has been converted to CSV and Excel files for this learning module.

The data set contains 32 rows and 29 columns. Each row represents a MLB team.

Variable Descriptions
Variable Description
Tm Team
#Bat Number of Players used in Games
BatAge Batters’ average age. Weighted by AB + Games Played
R/G Runs Scored Per Game
G Games Played or Pitched
PA Plate Appearances
AB At Bats
R Runs Scored/Allowed
H Hits/Hits Allowed
2B Second Base Hits
3B Third Base Hits
HR Home Runs Hit/Allowed
RBI Runs Batted In
SB Stolen Bases
CS Caught Stealing
BB Bases on Balls/Walks
SO Strikeouts
BA Hits/At Bats
OBP On-Base Percentage
SLG Slugging Percentage
OPS On-Base Plus Slugging
OPS+ Adjusted OPS
TB Total Bases
GDP Double Plays Grounded Into
HBP Times Hit by a Pitch
SH Sacrifice Hits
SF Sacrifice Flies
IBB Intentional Bases on Balls
LOB Runners Left On Base

Module Files

The full module materials are linked below.

Module

This shell page provides access to the module materials. Download the student-facing Jupyter notebook and the supporting data files above to complete the activity.

To run the notebook, users will need a Python environment with packages such as pandas, numpy, and openpyxl.

Summary

In this module, students use 2024 MLB team batting data to learn basic Pandas workflows in Python, including importing data, creating dataframes, selecting rows and columns, filtering data, computing summary statistics, handling missing values, changing data types, and converting dataframe columns to NumPy arrays.