The survey involved data munging/cleaning, exploratory data analysis, feature selection, and application of machine learning. Five supervised learning algorithms were applied, optimized, and compared on the classification problems, including:
Training was conducted with at least 3-fold validation, with 20% withheld for testing. Optimization of each algorithm was conducted with a grid search across 2-5 parameters, depending on the algorithm. The experimental results were on par with accepted benchmarks, and the accuracies are shown below:
Adult Income Credit Card Default Decision Trees 0.852 0.837 Neural Networks 0.843 0.844 *best* Gradient Boost 0.868 *best* 0.817 SVM 0.846 0.819 kNN 0.844 0.793
The study was conducted in Python, using sklearn.