Short courses

 

Practical deep learning with R

Sigrid Keydana (Rstudio, Münich, Germany)

Course description

This short course will introduce you to torch for R, an R-native port of PyTorch that requires no Python installation.

We start by a thorough introduction to the basics that make everything possible: tensors, automatic differentiation, and neural network modules.

Equipped with that knowledge, participants will explore two applications: time series forecasting and numerical optimization. While the former showcases renowned deep learning architectures, the latter demonstrates the usefulness of torch as a high-performance tensor-computation library, going beyond the deep learning context.

All modules will incorporate ample occasion for practice. This course does not presuppose familiarity with either deep learning concepts or frameworks; however, participants should have a basic knowledge of the R programming language, as well as of basic machine learning terminology.

 

Co-data learning in high dimensional prediction problems

Mirrelijn van Nee, Magnus Münch and Mark van de Wiel (Amsterdam University medical centers, Department of Epidemiology & Data Science)

Course description

In many high dimensional prediction settings, extra information on the features, termed co-data, is available. This may benefit prediction if included in the analysis. Co-data comes in different forms: (i) group structures, (ii) hierarchical group structures, and (iii) continuous co-data. In genomics, for example, we may have type (i) co-data in the form of a classification of the genes into functional domains, type (ii) in the form of overlapping and hierarchically organised pathways, and type (iii) as p-values from a previous, related study.

In this course, we introduce several prediction methods that can include these co-data types to improve predictive performance. The penalty parameters are efficiently estimated using empirical Bayes techniques. The course covers technical aspects of co-data learning in ridge regression, elastic net regression, and the random forest. In addition, each of the methods is also investigated in a hands-on practical using the freely available R packages ecpc, gren, and CoRF.

The learning outcomes of this course are three-fold: (i) statistical theory, (ii) statistical application, and (iii) R computing skills. The learning balance between these three outcome may depend on the participants prior knowledge and skills. Some knowledge of statistics is assumed, which includes penalized regression, maximum likelihood estimation, and tree-based learning. Basic understanding of genetics, including the concepts of genome, DNA and phenotype is also useful, but not strictly necessary. Lastly, for the practical part, basic  knowledge of R is required. The participants should be able to perform simple operations in R, such as installing packages, arithmetic, assigning and using variables, and applying functions.
 

 

Non-parametric Bayesian methods for classification

Boris Hejblum and Anaïs Rouanet (University of Bordeaux)

Course description

When performing clustering, a recurring issue is how to choose the appropriate number of clusters. The Bayesian non parametric framework allows to directly estimate the number of clusters within model-based clustering.

 

The first part of the course will be devoted to the Gaussian Dirichlet Process Mixture model and its Chinese Restaurant Process representation. We will cover theoretical concepts as well as hand-on R practicals illustrating those. The second part of the course will cover the case of supervised clustering where the clustering structure is guided by an outcome.  We will illustrate this using the freely available R package PReMiuM on a real epidemiological data application.

 

Requirements: participants of this course should have a working knowledge of R. Previous exposure to the Bayesian framework analysis and MCMC algorithms would be helpful to understand the concepts covered in this course.

 

 

 

Online user: 1