R code and data for "An evaluation of machine-learning for predictive Genome Wide Association Studies" (to appear)


Bibliographical metadata


Content type: Research data
Research data type: Other
Created date: 2015-07-27
Last modified date: 2016-08-19

Abstract: Pre-processed and derived data used in the study and R code used to generate and analyse it. The above files should enable anyone to repeat analysis described in the manuscript and confirm results presented therein.

Table of Contents: 1. yeast_pheno.RData: contains modified phenotypic data, fold indices and train/test indices for main analysis 2. geno.rds, geno_fuse.rds, geno_genes.rds: unaltered genoset used in the study and the two alternative genosets 3. alternative_genosets_creation.r: script for creating geno_genes.rds and geno_fuse.rds 4. data_preparation.r: code to produce yeast_pheno.RData and geno.rds 5. functions.r: colleciton of various functions used in the analysis 6. xpack.r: collection of CV-procedures for several R packages (glmnet, randomForest, GBM) used in the study 7. added_noise_experiment.r: code for experiment in which phenotypic noise is introduced to the data 8. reduced_dataset_experiment.r: code for experiment investigating importance of the number of sample points 10. reduced_markerset_experiment.r: code for experiment investigating importance of the number of attributes 11. correction_code.r: code for correcing the cross-validation procedure of Bloom et al 12. analysis_main.r: code for main analysis in the paper (results in table 1) 13. analysis_fused_genoset.r, analysis_genes_genoset.r: code for analysis on the two alternative genosets 14. cross.RData, pheno_raw.RData: original phenotypic and genotypic data of Bloom et al 15. README.txt
Date made available27 Jul 2015
PublisherUniversity of Manchester