Big Data Quality: From Content to Context

Document Type: Research Paper

Author

PhD, Department of Information Technology Management, Faculty of Management, University of Tehran, Tehran, Iran.

Abstract

Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data quality for organizations, there is no comprehensive literature review that shows the main differences between traditional data quality researches and Big Data quality researches. This paper analyzed the papers published in Big data quality and find out that there is almost no new mainstream about Big Data quality. It is shown in this paper that the main concepts of data quality does not changes in Big Data context and that only some new issues have been added to this area.

Keywords


Batini, C., Rula, A., Scannapieco, M., & Viscusi, G. (2016). From data quality to big data quality. In Big Data: Concepts, Methodologies, Tools, and Applications (pp. 1934-1956): IGI Global.

Batini, C., & Scannapieca, M. (2006). Methodologies for data quality measurement and improvement. Data Quality: Concepts, Methodologies and Techniques, 161-200.

Becker, D., King, T. D., & McMullen, B. (2015). Big data, big data quality problem. Paper presented at the 2015 IEEE International Conference on Big Data (Big Data).

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Breur, T. (2009). Data quality is everyone's business—Designing quality into your data warehouse—Part 1. Journal of Direct, Data and Digital Marketing Practice, 11(1), 20-29.

Brodie, M. L. (1980). Data quality in information systems. Information & Management, 3(6), 245-258.

Cai, L., & Zhu, Y. (2015). The challenges of data quality and data quality assessment in the big data era. Data Science Journal, 14. DOI: http://doi.org/10.5334/dsj-2015-002.

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile networks and applications, 19(2), 171-209.

Desai, K. Y. (2018). Big Data Quality Modeling And Validation. Master's Theses and Graduate Research. San Jose State University.

Dumbill, E. (2013). Making sense of big data. In: Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA.

Firmani, D., Mecella, M., Scannapieco, M., & Batini, C. (2016). On the meaningfulness of “big data quality”. Data Science and Engineering, 1(1), 6-20.

Fürber, C. (2015). Data quality management with semantic technologies: Springer.

Gao, J., Xie, C., & Tao, C. (2016). Big Data Validation and Quality Assurance--Issuses, Challenges, and Needs. Paper presented at the 2016 IEEE symposium on service-oriented system engineering (SOSE).

Juddoo, S. (2015). Overview of data quality challenges in the context of Big Data. Paper presented at the 2015 International Conference on Computing, Communication and Security (ICCCS).

Juran, J. M. (1974). Basic concepts. Quality control handbook, 2.

Kataria, M., & Mittal, M. P. (2014). Big data: a review. International Journal of Computer Science and Mobile Computing, 3(7), 106-110.

Khalilijafarabad, A., Helfert, M., & Ge, M. (2016). Developing a Data Quality Research Taxonomy-an organizational perspective. Paper presented at the ICIQ.

Khoury, M. J., & Ioannidis, J. P. (2014). Big data meets public health. Science, 346(6213), 1054-1055.

Klein, B. D. (2001). User perceptions of data quality: Internet and traditional text sources. Journal of Computer Information Systems, 41(4), 9-15.

Laney, D. (2001). 3D data management: Controlling data volume, velocity and variety. META group research note, 6(70), 1.

Liu, J., Li, J., Li, W., & Wu, J. (2016). Rethinking big data: A review on the data quality and usage issues. ISPRS journal of photogrammetry and remote sensing, 115, 134-142.

Lukyanenko, R., & Parsons, J. (2015). Information quality research challenge: adapting information quality principles to user-generated content. Journal of Data and Information Quality (JDIQ), 6(1), 3.

Ofner, M. H., Otto, B., & Österle, H. (2012). Integrating a data quality perspective into business process management. Business Process Management Journal, 18(6), 1036-1067.

Onyeabor, G. A., & Ta’a, A. (2018). A Model for Addressing Quality Issues in Big Data. Paper presented at the International Conference of Reliable Information and Communication Technology.

Risch, J. (2016). Detecting Twitter topics using Latent Dirichlet Allocation. Available at: http://uu.diva-portal.org/smash/get/diva2:904196/FULLTEXT01.pdf.

Saha, B., & Srivastava, D. (2014). Data quality: The other face of big data. Paper presented at the Data Engineering (ICDE), 2014 IEEE 30th International Conference on.

Shankaranarayanan, G., & Blake, R. (2017). From content to context: The evolution and growth of data quality research. Journal of Data and Information Quality (JDIQ), 8(2), 9.

Tilly, R., Posegga, O., Fischbach, K., & Schoder, D. (2015). What is Quality of Data and Information in Social Information Systems? Towards a Definition and Ontology. International Conference on Information Systems, At Fort Worth, TX, USA, Volume: 36.

Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4), 5-33.