Forecasting Success in the National Hockey League Using In-Game Statistics and Textual Data

Description
Title: Forecasting Success in the National Hockey League Using In-Game Statistics and Textual Data
Authors: Weissbock, Joshua
Date: 2014
Abstract: In this thesis, we look at a number of methods to forecast success (winners and losers), both of single games and playoff series (best-of-seven games) in the sport of ice hockey, more specifically within the National Hockey League (NHL). Our findings indicate that there exists a theoretical upper bound, which seems to hold true for all sports, that makes prediction difficult. In the first part of this thesis, we look at predicting success of individual games to learn which of the two teams will win or lose. We use a number of traditional statistics (published on the league’s website and used by the media) and performance metrics (used by Internet hockey analysts; they are shown to have a much higher correlation with success over the long term). Despite the demonstrated long term success of performance metrics, it was the traditional statistics that had the most value to automatic game prediction, allowing our model to achieve 59.8% accuracy. We found it interesting that regardless of which features we used in our model, we were not able to increase the accuracy much higher than 60%. We compared the observed win% of teams in the NHL to many simulated leagues and found that there appears to be a theoretical upper bound of approximately 62% for single game prediction in the NHL. As one game is difficult to predict, with a maximum of accuracy of 62%, then pre- dicting a longer series of games must be easier. We looked at predicting the winner of the best-of-seven series between two teams using over 30 features, both traditional and advanced statistics, and found that we were able to increase our prediction accuracy to almost 75%. We then re-explored predicting single games with the use of pre-game textual reports written by hockey experts from http://www.NHL.com using Bag-of-Word features and sentiment analysis. We combined these features with the numerical data in a multi-layer meta-classifiers and were able to increase the accuracy close to the upper bound
URL: http://hdl.handle.net/10393/31553
http://dx.doi.org/10.20381/ruor-6351
CollectionThèses, 2011 - // Theses, 2011 -
Files