Teaching is an opportunity to learn. At all levels, from undergraduate courses to PhD, lectures require an in-depth preparation of basic and advaced concepts which can be a constant source of ideas and insights for a researcher, especially in theoretical and computational domains. We firmly believe in the mutual feedback between research and teaching activity.
In spite of the tremendous technological advance in the past decades, some of the deeper scientific questions in life sciences are still elusive, which poses a challenge for teachers to design appropriate educational paths in such dynamical fields. At the same time this offers an opportunity to inspire and be inspired by the new generations.
Additionally, teaching programmes need to be adapted to the fast-paced technological developments, requiring early on training on modern data analysis methods, machine learning and AI. Current PhD students are often overwhelmed by the large amount of data that they can generate with the modern microscopy and genomic high-throughput techniques.
Part of our mission is to guide the next generation of researchers and professionals to the increasing data availability and provide them with appropriate training in Statistics, which is the basis of all the newest methodologies in data science.
While machine learning and AI can provide great exploratory tools, they are often not appropriate to test scientific hypotheses, which require conceptualization via model building and model-based statistical analyses. In this context, probabilistic programming tools (Rjags, PyMC3, Pyro, Stan) offer great educational resources to teach students how to rapidly design models and apply statistical inference to analyze data. Overall, there is a need for students to learn early on the difference between the various statistical approaches, and the scenarios where they can be applied.
The application of Data Science methods requires proficiency in at least one coding language (R, Python, C++). A large fraction of students finds coding intimidating, which is a limiting factor for any career in quantitative science. To prevent this, we can offer basic introductions to coding languages using interactive sessions and hands-on courses to make coding more approachable.
The teaching modules we propose are characterized by R/Python corners as brief coding sessions to illustrate how machine-learning and statistical methodologies are implemented in R or Python. The learn-by-doing approach can be very stimulating and insightful especially for young students.
Data visualization is the first step in formulating hypotheses. Students need to acquire data visualization skills. Tools such as ggplot and plotly are becoming standards in Data Science, moreover there is a growing attention towards dynamical visualizations, web interfaces and jupyter notebooks to facilitate data sharing and generate reproducible results. Part of our training courses are devoted to data visualization techniques and the tools to build modern graphical user interfaces. These are example of training packages that we offer at SAMPLED analytics: