Forecasting Success in the National Hockey League Using In-Game Statistics and Textual Data

FieldValue
dc.contributor.authorWeissbock, Joshua
dc.date.accessioned2014-09-17T11:13:41Z
dc.date.available2014-09-17T11:13:41Z
dc.date.created2014
dc.date.issued2014
dc.identifier.urihttp://hdl.handle.net/10393/31553
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-6351
dc.description.abstractIn this thesis, we look at a number of methods to forecast success (winners and losers), both of single games and playoff series (best-of-seven games) in the sport of ice hockey, more specifically within the National Hockey League (NHL). Our findings indicate that there exists a theoretical upper bound, which seems to hold true for all sports, that makes prediction difficult. In the first part of this thesis, we look at predicting success of individual games to learn which of the two teams will win or lose. We use a number of traditional statistics (published on the league’s website and used by the media) and performance metrics (used by Internet hockey analysts; they are shown to have a much higher correlation with success over the long term). Despite the demonstrated long term success of performance metrics, it was the traditional statistics that had the most value to automatic game prediction, allowing our model to achieve 59.8% accuracy. We found it interesting that regardless of which features we used in our model, we were not able to increase the accuracy much higher than 60%. We compared the observed win% of teams in the NHL to many simulated leagues and found that there appears to be a theoretical upper bound of approximately 62% for single game prediction in the NHL. As one game is difficult to predict, with a maximum of accuracy of 62%, then pre- dicting a longer series of games must be easier. We looked at predicting the winner of the best-of-seven series between two teams using over 30 features, both traditional and advanced statistics, and found that we were able to increase our prediction accuracy to almost 75%. We then re-explored predicting single games with the use of pre-game textual reports written by hockey experts from http://www.NHL.com using Bag-of-Word features and sentiment analysis. We combined these features with the numerical data in a multi-layer meta-classifiers and were able to increase the accuracy close to the upper bound
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectMachine learning
dc.subjectHockey
dc.titleForecasting Success in the National Hockey League Using In-Game Statistics and Textual Data
dc.typeThesis
dc.faculty.departmentInformatique / Computer Science
dc.contributor.supervisorInkpen, Diana
dc.degree.nameMCS
dc.degree.levelmasters
dc.degree.disciplineGénie / Engineering
thesis.degree.nameMCS
thesis.degree.levelMasters
thesis.degree.disciplineGénie / Engineering
uottawa.departmentInformatique / Computer Science
CollectionThèses, 2011 - // Theses, 2011 -

Files