Repository logo

Predicting High-cost Patients in General Population Using Data Mining Techniques

dc.contributor.authorIzad Shenas, Seyed Abdolmotalleb
dc.contributor.supervisorRaahemi, Bijan
dc.contributor.supervisorKuziemsky, Craig
dc.date.accessioned2012-10-26T14:15:49Z
dc.date.available2012-10-26T14:15:49Z
dc.date.created2012
dc.date.issued2012
dc.degree.disciplineGestion / Management
dc.degree.levelmasters
dc.degree.nameMSc
dc.description.abstractIn this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey’s Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual’s overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. It is worthy to note that we do not consider number of visits to care providers as a predictor since it has a high correlation with the expenditure, and does not offer a new insight to the data (i.e. it is a trivial predictor). We predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Consequently, the results from this study can be used by policy makers, health planners, and insurers to plan and improve delivery of health services.
dc.embargo.termsimmediate
dc.faculty.departmentSystèmes de santé / Health Systems
dc.identifier.urihttp://hdl.handle.net/10393/23461
dc.identifier.urihttp://dx.doi.org/10.20381/ruor-6153
dc.language.isoen
dc.publisherUniversité d'Ottawa / University of Ottawa
dc.subjectCost prediction
dc.subjectData mining
dc.subjectDecision Trees
dc.subjectNeural networks
dc.subjectMedical Expenditure Panel Survey
dc.subjectPredictive modelling
dc.titlePredicting High-cost Patients in General Population Using Data Mining Techniques
dc.typeThesis
thesis.degree.disciplineGestion / Management
thesis.degree.levelMasters
thesis.degree.nameMSc
uottawa.departmentSystèmes de santé / Health Systems

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
Izad_Shenas_Seyed_Abdolmotalleb_2012_thesis.pdf
Size:
3.51 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail ImageThumbnail Image
Name:
license.txt
Size:
4.21 KB
Format:
Item-specific license agreed upon to submission
Description: