Using Machine Learning to Predict the 2023 Kentucky Derby … – DataDrivenInvestor

Can the forecasted weather be used to predict the winning race time?

My hypothesis is that the weather plays a major impact on the Kentucky Derbys winning race time. In this analysis I will use the Kentucky Derbys forecasted weather to predict the winning race time using Machine Learning (ML). In previous articles I discussed the importance of using explainable ML in a business setting to provide business insights and help with buy-in and change management. In this analysis, because Im striving purely for accuracy, I will disregard this advice and go directly to the more complex, but accurate, black box Gradient Boosted Machine (GBM), because we want to win some money!

The data I will use comes from the National Weather Service:

# Read in Data #data <- read.csv("...KD Data.csv")

# Declare Year Variables #year <- data[,1]

# Declare numeric x variables #numeric <- data[,c(2,3,4)]

# Scale numeric x variablesscaled_x <- scale(numeric)# check that we get mean of 0 and sd of 1colMeans(scaled_x)apply(scaled_x, 2, sd)

# One-Hot Encoding #data$Weather <- as.factor(data$Weather)xfactors <- model.matrix(data$Year ~ data$Weather)[, -1]

# Bring prepped data all back together #scaled_df <- as.data.frame(cbind(year,y,scaled_x,xfactors))

# Isolate pre-2023 data #old_data <- scaled_df[-1,]new_data <- scaled_df[1,]

# Gradient Boosted Machine ## Find Max Interaction Depth #floor(sqrt(NCOL(old_data)))

# find index for n trees with minimum CV errorbest.iter <- gbm.perf(tree_mod, method="OOB", plot.it=TRUE, oobag.curve=TRUE, overlay=TRUE)print(best.iter)

In this article, I chose a more accurate, but complex, black box model to predict the Kentucky Derbys winning race time. This is because I dont care about generating insights or winning buy-in or change management, rather I want to use the model that is the most accurate so I can make a data driven gamble. In most business cases you will give up accuracy for explainability, however there are some instances (like this one) in which accuracy is the primary requirement of a model.

This prediction is based off forecasted weather for Saturday May 6th, taken on Thursday May 4th, so obviously it should be taken with a grain of salt. As everyone knows, even with huge amounts of technology, predicting weather is very difficult. Using forecasted weather to predict the winning race time adds even more uncertainity. That being said, I will take either the over or the under that matches my predicted winning time of 122.12 seconds.

Read the original post:
Using Machine Learning to Predict the 2023 Kentucky Derby ... - DataDrivenInvestor

Related Posts

Comments are closed.