Can deep learning work in the real world?: A data-centric perspective (IDEATE) - [2023 - 2026] Key Words: Deep learning, Learning with Noisy Labelling, Data-Centric Deep Learning, Uncertainty Modelling, Self-supervised Learning
Name of the project: Can deep learning work in the real world?: A data-centric perspective (IDEATE)
Principal Investigator (PI, Co-PI,..): Petia Radeva & Ricardo Marques
Funding entity: Ministry of Science, Innovation and Universities, State Agency of Investigation, Spain
Duration: 01/09/2023 - 30/08/2026
The effectiveness and efficiency of ML/DL systems depend on the nature of the data and the models’ capacity. Standard ML pipelines are built around specific training tasks characterised by a heuristic model specification, an available training dataset, and an independent and identically distributed evaluation procedure. These properties make the models’ application in real conditions difficult; available effects of spurious correlations, unwanted biases, or opaque predictors in trained models are now quite widespread
Hypothesis: We hypothesise that improving the data set quality often results in better performance than just mindlessly fiddling with model hyperparameters. Collecting data, cleaning it, and making it suitable for ML training takes up to 90% of the time. However, with the recent rapid DL development, it becomes clear that data-centric ML/DL is one of the next challenges. Since data will never be fully clean in real scenarios, DL models should be prepared to cope with imperfect data during model training using robust model training techniques.
General objective of the IDEATE project based on our research perspective,