Introduction to Regression Trees with Climbing Data

Prediction

Modeling

Regression Trees

Exploratory Data Analysis

Using climbing data to introduce regression trees and numeric prediction in Python

Author

Affiliation

Izaan Khudadad

University of North Carolina at Charlotte

Published

May 11, 2026

Facilitation notes

This module was originally developed as a Jupyter notebook. The SCORE preprint server does not run the notebook directly, so users should download the materials below and run the notebook locally or in a Jupyter-compatible environment.

Students should be provided with the following files:

Additional instructor materials are available below:

Solutions Jupyter notebook

Introduction

In this activity, you will explore a dataset on indoor rock climbing attempts, compiled from climbers logging their performance on various climbing problems in a gym setting. The dataset includes information about the climber (such as weight and skill rating), the climb (grade difficulty and wall angle), and performance indicators (number of attempts, success or failure).

Each observation in the dataset represents a single climber, along with summary statistics of their climbing history. Instead of tracking individual attempts, the data records features such as the climber’s maximum grade achieved, average grade, total number of climbs, and years active.

This structure allows you to examine patterns in climber performance, experience, and progression over time, and to build predictive models based on climber characteristics and aggregated performance metrics.

Learning Goals

In this module, students will explore how to use regression trees to model climbing performance data. By building and interpreting decision tree models, students will develop core data science skills such as:

Understanding how decision trees split data based on predictor variables
Visualizing tree structures to interpret model decisions
Preparing and selecting features for modeling
Applying regression trees to make numeric predictions, such as number of attempts

Data

The dataset contains more than 10,000 observations from climbers attempting indoor climbing problems/routes. Each row in the dataset represents a single climber’s record, combining both personal attributes, such as age, years of climbing, or weight, and performance outcomes, such as maximum grade achieved or number of climbs. This structure makes it possible to investigate how climber characteristics relate to climbing outcomes and to identify patterns in performance, effort, and progression.

Climber data

Variable Descriptions

Variable	Description
user_id	Unique identifier for each climber
country	Country of residence or origin of the climber
sex	Sex of the climber (M or F)
weight	Weight of the climber in kilograms
height	Height of the climber
age	Age of the climber in years
years_cl	Number of years the climber has been climbing
date_first	Date of the climber’s first recorded climb
date_last	Date of the climber’s most recent recorded climb
grades_count	Total number of climbs completed with a recorded grade
grades_first	Difficulty grade of the first climb the user completed
grades_last	Difficulty grade of the most recent climb completed
grades_max	Highest grade completed by the climber
grades_mean	Average grade of all climbs completed
year_first	Year of the climber’s first recorded climb
year_last	Year of the climber’s most recent recorded climb

Module Files

The full module materials are linked below.

Module

This shell page provides access to the module materials. Download the student-facing Jupyter notebook and the supporting data file above to complete the activity.

To run the notebook, users will need a Python environment with packages such as pandas, matplotlib, and scikit-learn.

Summary

In this module, students explored how to use regression trees to model and predict peak climbing performance using real-world climbing data. By building and visualizing decision tree models, students practiced key data science skills such as:

Selecting relevant predictor variables like age, experience, and physical characteristics
Preprocessing categorical data using dummy variables
Training and interpreting regression tree models using scikit-learn
Visualizing model splits to understand how the algorithm makes predictions