Big Data Analysis

Paola Cerchiello, Alessandro Bitetto


Knowledge of basic concepts of Statistics like inference, confidence interval, design of cluster samples. For the coding part, some familiarity with Python language, or similar software like R is required.



The aim of this course is to study and apply the most relevant statistical models in the analysis of large data set.
The perspective in mainly applicative: choosing and applying suitable models to exploit the whole informative content of (large) data set with a particular attention to the correct and contextualized interpretation of final results. Moreover, a focus will be set on some frameworks for the management of large data set like MapReduce for data clustering.
The course will be held with the interactive employment of open source software like Python to learn practically the complete analysis work-flow.
A particular emphasis will be given to social network data, textual data, business-financial case studies.


  • Inferential statistics;
  • Testing hypotheses;
  • statistical estimation;
  • Simple and Multivariate regression;
  • Ridge and Lasso regression;
  • Naive Bayes Classifier;
  • Latent Dirichlet Analysis;
  • Clustering algorithm;
  • Support Vector Machines;
  • Decision Tree;
  • Bagging and Boosting.


The class integrates theoretical lectures with practicals based on Python to learn how to implement and analyze the most appropriate models according to the available data.
A tutor will help students weekly in acquiring all the necessary theoretical and practical knowledge.
All the material used during the lectures (slides. script. data) will be available on Kiro platform.

For further information please refer to the Online Courses Catalogue – CLICK HERE

For lecture materials please refer to KIRO – CLICK HERE

*Materials available when lectures start