Education

Teaching is an opportunity to learn. At all levels, from undergraduate courses to PhD, lectures require an in-depth preparation of basic and advaced concepts which can be a constant source of ideas and insights for a researcher, especially in theoretical and computational domains. We firmly believe in the mutual feedback between research and teaching activity.

In spite of the tremendous technological advance in the past decades, some of the deeper scientific questions in life sciences are still elusive, which poses a challenge for teachers to design appropriate educational paths in such dynamical fields. At the same time this offers an opportunity to inspire and be inspired by the new generations.

Additionally, teaching programmes need to be adapted to the fast-paced technological developments, requiring early on training on modern data analysis methods, machine learning and AI. Current PhD students are often overwhelmed by the large amount of data that they can generate with the modern microscopy and genomic high-throughput techniques.

Part of our mission is to guide the next generation of researchers and professionals to the increasing data availability and provide them with appropriate training in Statistics, which is the basis of all the newest methodologies in data science.

While machine learning and AI can provide great exploratory tools, they are often not appropriate to test scientific hypotheses, which require conceptualization via model building and model-based statistical analyses. In this context, probabilistic programming tools (Rjags, PyMC3, Pyro, Stan) offer great educational resources to teach students how to rapidly design models and apply statistical inference to analyze data. Overall, there is a need for students to learn early on the difference between the various statistical approaches, and the scenarios where they can be applied.

The application of Data Science methods requires proficiency in at least one coding language (R, Python, C++). A large fraction of students finds coding intimidating, which is a limiting factor for any career in quantitative science. To prevent this, we can offer basic introductions to coding languages using interactive sessions and hands-on courses to make coding more approachable.

The teaching modules we propose are characterized by R/Python corners as brief coding sessions to illustrate how machine-learning and statistical methodologies are implemented in R or Python. The learn-by-doing approach can be very stimulating and insightful especially for young students.

Data visualization is the first step in formulating hypotheses. Students need to acquire data visualization skills. Tools such as ggplot and plotly are becoming standards in Data Science, moreover there is a growing attention towards dynamical visualizations, web interfaces and jupyter notebooks to facilitate data sharing and generate reproducible results. Part of our training courses are devoted to data visualization techniques and the tools to build modern graphical user interfaces. These are example of training packages that we offer at SAMPLED analytics:

SINF-1

Data visualization and statistical inference with R
  1. Introduction to R programming language
  2. Basic data structures and operations
  3. Efficient data structures and data manipulation
  4. Data visualization using ggplot and plotly
  5. Introduction to Bayesian statistics and probabilistic programming (RJAGS)
  6. Bayesian estimation of model parameters
  7. R markdown and Shiny applications.
  8. Data Science web applications with Javascript: plotly.js ad D3.js

pyML

Machine-learning with Python
  1. Python basic data structures
  2. Numpy library
  3. image-processing using OpenCV computer vision library
  4. Building graphical user interfaces with PyQt library
  5. MySQL database management through Python API
  6. Introduction to artificial neural networks
  7. Image classification using PyTorch
  8. Recurrent neural networks and natural language processing

IIT

Introduction to Information theory
  1. Shannon information measures
  2. Shannon source and channel coding theorems
  3. Decoding analysis and applications to neuronal data
  4. Bayesian and machine-learning decoders

SINF-2

Advanced statistical methods
  1. Monte Carlo methods for statistical inference
  2. Hierarchical models for biological noise and diversity
  3. Bayesian inference on phylogenetic trees
  4. Generalized linear mixed models with applications to transcriptomics and neuroscience
  5. Bayesian non parametrics: Gaussian and Dirichlet processes
  6. Dirichlet mixture model and model-based clustering
  7. State-space models and sequential Monte Carlo methods
  8. Variational approaches to statistical inference. Message-passing approximation.
  9. Variational autoencoders

STDBIO

Stochastic thermodynamics of biological systems
  1. Stochastic processes: Master equation and Langevin dynamics
  2. Entropy production in non-equilibrium processes
  3. Analysis of Master equation systems, Schnakenberg decomposition of entropy production at non-equilibrium steady states.
  4. Thermodynamics of information processing, Landauer principles and revisited Maxwell demon
  5. Free energy transduction in biochemical cycle kinetics