Article summary:

1. k-nearest neighbors algorithm is a simple yet effective data mining technique, but it becomes imprecise and inefficient when dealing with massive amounts of noisy and imperfect data.

2. Data preprocessing techniques such as instance reduction or missing values imputation can transform Big Data into Smart Data by removing noise and redundant samples or imputing missing values, making the k-nearest neighbors rule a core algorithm for identifying and correcting imperfect data.

3. The article investigates the role of the k-nearest neighbors algorithm in a supervised learning context, presents emerging big data-ready versions of these algorithms, and provides guidelines on how to use them to obtain Smart/Quality Data for high-quality data mining processes. Multiple Spark Packages have been developed including all the Smart Data algorithms analyzed.

