---
title: "Examining Analysis of Variance Using the NBA Draft"
---

In this activity, we are going to be using data from the NBA draft to explore Analysis of Variance (ANOVA).

## Part 1 - Introduction

#### The NBA Draft

Each year, the National Basketball Association (NBA) holds a draft, where prospective basketball players are able to be chosen to join one of the 30 professional teams across the United States and Canada.

In order to be eligible for the draft, a player must be at least 19 years old and out of high school for at least one year. Prior to 2006, this rule was not in effect, and players could be drafted during/right out of high school.

Each year, the draft is comprised of 60 players and takes place over two rounds of 30 selections. Teams pick players in an order based on performance from the previous season, with teams that performed poorly getting earlier picks in order to create a seemingly more level playing field. It's important to note that there weren't always 30 players selected in each round, the number made its way up to 30 as more teams were added into the NBA. 

#### Playing Basketball

The goal in basketball is to score as many points as possible by throwing the ball through the other team's hoop.

In the middle of the court there is a half-court line that divides the two sides. On both sides, surrounding the hoops, there is an arch called the three-point line. Within the three-point line stands the free-throw line, where a player would shoot from should there be a foul called.

**Scoring**

Players can shoot towards the hoop from any point on the court. A different number of points is awarded based on where the player is standing when they release the ball. The three shots are explained below.

-   Field Goal: Worth 2 points, scored by shooting within the three point line
-   Three Pointer: Worth 3 points, scored by shooting outside the three point line.
-   Free Throw: Worth one point, taken from the free throw line after a foul.

A rebound is when a player from the team that is on defense obtains the ball after the other team takes a shot attempt at the hoop. At the end of the game, which is played in four twelve-minute quarters, the team with the most points wins.

For this activity, you will be using data from the data set `nba_draft.csv`. The data set includes data from players that were selected in the NBA drafts between the years 1990-2021. Each row represents a different player and statistics from their respective careers in the NBA, some of which are still ongoing. Some key variables include `mins_per_game` the average number of minutes played per game in a player's career, `round_picked`, which shows what round the player was selected in, and `pick_in_round` which shows what number pick a player was in the round in which they were selected. 

## Part 2 - Loading and Cleaning the Data 

**1.  Load in the data file `nba_draft.csv`.** 

**2.  Right now, we only want to look at players that were selected in the first round of the draft. This information is found in the `round_picked` variable. Write the code that selects only those players who were chosen in the first round. Store it in an object called `nba_round1s`.**

**3. In order to being our ANOVA exploration, we need to look at different groups within the round one selections. Since we have the order in which the players were selected (stored in the `pick_in_round` variable), we can make an additional categorical variable to keep track of where in the draft the player was selected. Make a new variable, called `pick_category`, which follows the following conditions: **

- `pick_category` = "1-10" if the player was selected in picks 1-10, 
- `pick_category` = "11-20" if the player was selected in picks 11-20,
- `pick_category` = "21-30" if the player was selected in picks 21-30


**4. Make the `pick_category` variable a factor with three levels.**

## Part 3 - ANOVA

**1. Create a box plot that compares the average minutes played per game (`mins_per_game`) based on a player's selection in the first round of the NBA draft (`pick_category`). Try to include a title, color coded legend, and axis labels.**

**2. Below is a density plot and its corresponding code comparing the average minutes played per game based on a player's selection in the first round of the NBA draft. Based on this density plot and the box plots you created above, do you feel as though ANOVA is appropriate? What conclusions can you draw upon first glance of the two data visualizations? Are there any concerns? **

![Density Plot](DensityPlot.jpg)

```{r}
means = nba_round1s %>%
  group_by(pick_category) %>%
  summarize(
    mean_value = mean(mins_per_game))


ggplot(nba_round1s,
       mapping = aes(x = mins_per_game,
                     fill = pick_category)) +
  geom_density(alpha = 0.4) +
  geom_vline(data = means,
             mapping = aes(xintercept = mean_value, color = pick_category),
             linetype = "dashed",
             size = 1,
             show.legend = FALSE) +
  labs(title = "Distribution of MPG based on Round 1 NBA draft picks",
       x = "Minutes Per Game",
       y = "Density",
       fill = "Pick in draft in NBA") +
  theme_minimal()
```

**3. Assuming an ANOVA test is most likely appropriate for our data, write and interpret in context the null and alternative hypotheses we'll be using to determine if the average number of minutes played per game (`mins_per_game`) is different based on the position in which the player is picked in the first round (`pick_in_round`).**
  
**4. Run an ANOVA test in R and print the summary table** 

**5. Is there sufficient evidence that there is a difference in the mean amount of minutes played per game based on the group that player was selected in the first round of the draft?**

**6. Between which groups are we *most likely* to see a significant difference. Between which groups are we *least likely* to see a significant difference in minutes played. Provide evidence.**

**7. Is there evidence that there is a difference in average minutes played per game between players drafted in picks 1-10 compared to players drafted in picks 11-20? If so, what is the difference? Provide evidence as well as a specific number.**

**8. Were your predictions in question 6 correct? Are there any comparisons that don't have a significant difference?** 

 