In today's data-driven world, the ability to extract valuable insights from large and complex datasets is highly relevant after across various fields and disciplines. The “Data Analysis” course aims to provide students with with a solid background on the main concepts and techniques to effectively collect, clean, analyze, and interpret data to make informed decisions.

Throughout the course, students will learn the principles and methodologies of data analysis, as well as the practical tools and software used by data analysts. Specifically, they will gain hands-on experience working with real-world datasets, applying statistical methods, and using the R scientific computing software and programming language.

The course will be based on both a theoretical of concepts underyling data analysis, and on a very practical approach of "learning by doing", also thanks to the strong emphasis on interactive work and lab sessions. Students are expected to have some basic understanding of the mathematical concepts of variable, distribution, and statistic (mean, median, standard deviation...). Moreover, students are expected to have good computer utilisation skills.
The course will cover the following contents: introduction to data analysis (Introduction to data, reading and cleaning data describing data, visualising data); applied data analysis (introduction to scientific computing; importing data; transforming data; visualising data; data modelling); background in statistics and inference (regresssion analysis); introduction to machine learning; machine learning application. Moreover the course will also include three interactive lab sessions using the R scientific computing software. Session 1: data analysis in R; Session 2: regression analysis in R; Session 3: machine learning in R

The final examination will consist of a multiple choice written exam and a 2-hour lab exam, to be carried out with personal laptop computers. In addition, home assignments following the lab three sessions will also be assigned to the students, accounting for up to one third of the final mark.

Office hours will be available upon appointment throughout the course duration and in the two weeks after the final examination.

___

Course bibliography

• Garrett Grolemund & Hadley Wickham, R for Data Science, 2nd edition

• Måns Thulin, Modern Statistics with R: From wrangling and exploring data to inference and predictive modelling

• Bradley Boehmke & Brandon Greenwell , Hands-On Machine Learning with R