Data Labeling: Industrial Challenges and Mitigation Strategies
Labels are a prerequisite to perform supervised machine learning. However, in industrial contexts, data is often incomplete because labels are missing partially or entirely. Even if there exist manual, semi-automatic, and automatic techniques, such as crowdsourcing, active-learning (AL), and semi-supervised learning (SSL), we have seen that AL and SSL are rarely implemented due to lack of knowledge of their existence. Furthermore, labeling instances manually is not optimal as it is time-consuming and challenging to maintain the quality of the labels due to the human factor. Crowdsourcing has its merits, but it also includes sharing potentially sensitive data with third-party organizations, which is not an option for many companies. In this talk, we will discuss data labeling challenges and mitigation strategies based on the research conducted so far and how we wish to push it forward in the future.