Machine Learning Analysis Pipeline for
Genome-Wide Accociation Study SNP data

You may download a stand-alone version as runnable JAR file in our downloads section.

Short description:
Using machine learning methods, GWAS data can be analysed for more complex relations between single nucleotide polymorphisms (SNPs) and diseases than simple statistical methods that look at each SNP separately. However, this often requires the use of several tools and the necessity of intensive data conversion and user interaction. We developed an automated pipeline that uses state-of-the-art machine learning algorithms to create a disease risk model based on a given GWAS SNP dataset and assesses its predictive performance for unseen datasets. The pipeline can either use a first dataset for training the model and a second for validation, or perform a nested k-fold cross-validation on a single dataset. For each training set a basic case/control association analysis is performed to estimate the association between each single SNP and the phenotype. Using this information the dataset is filtered to create multiple subsets that contain only SNPs below a certain p-value threshold and for each subset a model is trained using a support vector machine (LIBSVM: linear and RBF kernel) and tested on its corresponding validation subset. The prediction performance is measured as the area under the ROC curve (AUC) and visualized in a plot showing average AUC and standard deviation for each p-value threshold. The only required input for this pipeline is the SNP data, all other parameters use default values, but can be specified by the user, if wanted. During the whole process, the pipeline takes care of the necessary conversions between different data formats and stores all intermediate data and final results to allow for subsequent analysis of single steps. Additionally, if a second analysis is performed on the same dataset with different parameters, e.g., adding another p-value threshold, the pipeline will not repeat steps to create data that is still valid.

Release of version 1.0.2: Download the stand-alone version as zipped, runnable JAR file.  -   (mittag - 2013-06-25 16:23)

First Release: Download the stand-alone version as runnable JAR file.  -   (mittag - 2011-11-21 15:58)

This project is promoted by: