Co-data learning in high dimensional prediction problems

Mirrelijn van Nee, Magnus Münch and Mark van de Wiel (Amsterdam University medical centers, Department of Epidemiology & Data Science)

Course description

In many high dimensional prediction settings, extra information on the features, termed co-data, is available. This may benefit prediction if included in the analysis. Co-data comes in different forms: (i) group structures, (ii) hierarchical group structures, and (iii) continuous co-data. In genomics, for example, we may have type (i) co-data in the form of a classification of the genes into functional domains, type (ii) in the form of overlapping and hierarchically organised pathways, and type (iii) as p-values from a previous, related study.
In this course, we introduce several prediction methods that can include these co-data types to improve predictive performance. The penalty parameters are efficiently estimated using empirical Bayes techniques. The course covers technical aspects of co-data learning in ridge regression, elastic net regression, and the random forest. In addition, each of the methods is also investigated in a hands-on practical using the freely available R packages ecpc, gren, and CoRF.
The learning outcomes of this course are three-fold: (i) statistical theory, (ii) statistical application, and (iii) R computing skills. The learning balance between these three outcome may depend on the participants prior knowledge and skills. Some knowledge of statistics is assumed, which includes penalized regression, maximum likelihood estimation, and tree-based learning. Basic understanding of genetics, including the concepts of genome, DNA and phenotype is also useful, but not strictly necessary. Lastly, for the practical part, basic knowledge of R is required. The participants should be able to perform simple operations in R, such as installing packages, arithmetic, assigning and using variables, and applying functions.

Message to attendees

Dear attendees,
We're looking forward to welcome you at our short course on "Co-data learning in high dimensional prediction problems".
For a swift start, please download the following R-packages by running the following commands in R:

 

You may want to download the course material beforehand, which is available on https://github.com/Mirrelijn/Short-course-CNC2021.
Thanks and see you soon,
Mirrelijn, Magnus and Mark