Education

Teaching is an opportunity to learn. At all levels, from undergraduate courses to PhD, lectures require an in-depth preparation of basic and advaced concepts which can be a constant source of ideas and insights for a researcher, especially in theoretical and computational domains. We firmly believe in the mutual feedback between research and teaching activity.

In spite of the tremendous technological advance in the past decades, some of the deeper scientific questions in life sciences are still elusive, which poses a challenge for teachers to design appropriate educational paths in such dynamical fields. At the same time this offers an opportunity to inspire and be inspired by the new generations.

Additionally, teaching programmes need to be adapted to the fast-paced technological developments, requiring early on training on modern data analysis methods, machine learning and AI. Current PhD students are often overwhelmed by the large amount of data that they can generate with the modern microscopy and genomic high-throughput techniques.

Part of our mission is to guide the next generation of researchers and professionals to the increasing data availability and provide them with appropriate training in Statistics, which is the basis of all the newest methodologies in data science.

While machine learning and AI can provide great exploratory tools, they are often not appropriate to test scientific hypotheses, which require conceptualization via model building and model-based statistical analyses. In this context, probabilistic programming tools (Rjags, PyMC3, Pyro, Stan) offer great educational resources to teach students how to rapidly design models and apply statistical inference to analyze data. Overall, there is a need for students to learn early on the difference between the various statistical approaches, and the scenarios where they can be applied.

The application of Data Science methods requires proficiency in at least one coding language (R, Python, C++). A large fraction of students finds coding intimidating, which is a limiting factor for any career in quantitative science. To prevent this, we can offer basic introductions to coding languages using interactive sessions and hands-on courses to make coding more approachable.

The teaching modules we propose are characterized by R/Python corners as brief coding sessions to illustrate how machine-learning and statistical methodologies are implemented in R or Python. The learn-by-doing approach can be very stimulating and insightful especially for young students.

Data visualization is the first step in formulating hypotheses. Students need to acquire data visualization skills. Tools such as ggplot and plotly are becoming standards in Data Science, moreover there is a growing attention towards dynamical visualizations, web interfaces and jupyter notebooks to facilitate data sharing and generate reproducible results. Part of our training courses are devoted to data visualization techniques and the tools to build modern graphical user interfaces. These are example of training packages that we offer at SAMPLED analytics:

SINF-1

Data visualization and statistical inference with R

Introduction to R programming language
Basic data structures and operations
Efficient data structures and data manipulation
Data visualization using ggplot and plotly
Introduction to Bayesian statistics and probabilistic programming (RJAGS)
Bayesian estimation of model parameters
R markdown and Shiny applications.
Data Science web applications with Javascript: plotly.js ad D3.js

pyML

Machine-learning with Python

Python basic data structures
Numpy library
image-processing using OpenCV computer vision library
Building graphical user interfaces with PyQt library
MySQL database management through Python API
Introduction to artificial neural networks
Image classification using PyTorch
Recurrent neural networks and natural language processing

IIT

Introduction to Information theory

Shannon information measures
Shannon source and channel coding theorems
Decoding analysis and applications to neuronal data
Bayesian and machine-learning decoders

SINF-2

Advanced statistical methods

Monte Carlo methods for statistical inference
Hierarchical models for biological noise and diversity
Bayesian inference on phylogenetic trees
Generalized linear mixed models with applications to transcriptomics and neuroscience
Bayesian non parametrics: Gaussian and Dirichlet processes
Dirichlet mixture model and model-based clustering
State-space models and sequential Monte Carlo methods
Variational approaches to statistical inference. Message-passing approximation.
Variational autoencoders

STDBIO

Stochastic thermodynamics of biological systems

Stochastic processes: Master equation and Langevin dynamics
Entropy production in non-equilibrium processes
Analysis of Master equation systems, Schnakenberg decomposition of entropy production at non-equilibrium steady states.
Thermodynamics of information processing, Landauer principles and revisited Maxwell demon
Free energy transduction in biochemical cycle kinetics