Tools for Consumer Preference Analysis Based in Machine Learning

Document Type : Research Paper

Authors

1 Professor, Head of Computer Systems Department, Kharkiv National Automobile and Highway University, Kharkiv, 61002, Ukraine; Daugavpils University, Daugavpils, LV – 5401, Latvia.

2 Assistant Professor, Computer Systems Department, Kharkiv National Automobile and Highway University, Kharkiv, 61002, Ukraine.

3 Associate professor, Head of the Department of Law, Management & Economics, Daugavpils University, Daugavpils, LV-5401, Latvia.

4 Associate professor, Department of Law, Management & Economics, Daugavpils University, Daugavpils, LV-5401, Latvia.

5 Professor, Dr. Sci. (habil.) in Economics, Department of Economic Cybernetics and Economic Security Management, Kharkiv National University of Radio Electronics, Kharkiv, 61166, Ukraine.

6 Professor, Dr. Sci. (habil.) in Economics, Department of Business Economics and Іnternational Economic Relations, National Technical University "Kharkiv Polytechnic Institute", 61002, Kharkiv, Ukraine.

7 Associate professor, Department of Business Economics and International Economic Relations National Technical University "Kharkiv Polytechnic Institute", 61002, Kharkiv, Ukraine.

10.22059/jitm.2024.99048

Abstract

Today, users generate various data increasingly using the Internet when choosing a product or service. This leads to the generation of data about the purchases and services of various consumers. In addition, consumers often leave feedback about the purchase. At the same time, consumers discuss their attitudes about goods and services on social networks, messengers, thematic sites, etc. This leads to the emergence of large volumes of data that contain useful information about various manufacturers of goods and services. Such information can be useful to both ordinary users and large companies. However, it is practically impossible to use this information due to the fact that it is located in different places, that is, it has a raw, unstructured character. At the same time, depending on the target group of users, not the entire data set is needed, but a specific target sample. To solve this problem, it is necessary to have a tool for structuring information arrays and their further analysis depending on the set goal. This can be done with the help of various frameworks that use methods of machine learning and work with data. This work is devoted to elucidating the problem of creating means for evaluating consumer preferences based on the analysis of large volumes of data for its further use by the target audience.  The goal of the development of big data analysis systems is obtaining new, previously unknown information. The methodology of application of algorithms of work with large data sets and methods of machine learning is used, namely the pandas library for operations on a data set and logistic regression for information classification As a result, a system was built that allows the analysis of lexical information, translate it into numerical format and create on this basis the necessary statistical samples. The originality of the work lies in the use of specialized libraries of data processing and machine learning to create data analysis systems. The practical value of the work lies in the possibility of creating data analysis systems built using specialized machine learning libraries.

Keywords


 Babenko, V., Kulczyk, Z., Perevozova, I., Syniavska, O., Davydova, O. (2019). Factors of Development of International e-Commerce in the Context of Globalization. CEUR Workshop Proceedings, vol. 2422, pp. 345-356. http://ceur-ws.org/Vol-2422/paper28.pdf
Babenko V., Panchyshyn A., Zomchak L., Nehrey M., Artym-Drohomyretska Z., Lahotskyi T. Classical Machine Learning Methods in Economics Research. Macro and Micro Level Example. WSEAS Transactions on Business and Economics 2021, 18, 209-217; https://doi.org/10.37394/23207.2021.18.22
Bressert E. SciPy and NumPy 1st Edition; Publisher: O'Reilly Media, .USA, 2012; 57.
Coelho L. P. Building Machine Learning Systems with Python; Publisher: Packt Publishing, UK, 2018; 406.
Führer C., Solem J. E., Verdier O. Scientific Computing with Python: High-performance scientific computing with NumPy, SciPy, and pandas, 2nd Edition 2nd ed. Edition; Publisher: Packt Publishing, UK, 2021; 392 p.
Gontareva, I., Babenko, V., Shmatko, N., Litvinov, O., Hanna, O. (2020). The Model of Network Consulting Communication at the Early Stages of Entrepreneurship. WSEAS Transactions on Environment and Development, Vol. 16, pp. 390-396. https://doi.org/10.37394/232015.2020.16.39
Guryanova L., Yatsenko R., Dubrovina N., Babenko V. (2020). Machine learning methods and models, predictive analytics and applications. CEUR Workshop Proceedings, 2020, Available online: http://ceur-ws.org/Vol-2649/
Guryanova L., Yatsenko R., Dubrovina N., Babenko V., Gvozditskyi V. Machine Learning Methods and Models, Predictive Analytics and Applications: Development Trends in the Post-crisis Syndrome Caused by COVID-19. CEUR Workshop Proceedings, 2021, Available online: http://ceur-ws.org/Vol-2927/paper1.pdf
Janssens J. Data Science at the Command Line: Facing the Future with Time-Tested Tools 1st Edition; Publisher: O'Reilly Media, USA, 2014; 212 p.
Johansson R. Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib 2nd ed. Edition; Publisher: Apress, USA, 2018; 723 p.
Harrison M. Effective Pandas: Patterns for Data Manipulation (Treading on Python); Publisher: Independently published, USA, 2021; 497 p.
Karau H., Konwinski A., Wendell P., Zaharia M. Learning Spark: Lightning-Fast Big Data Analysis 1st Edition; Publisher: O'Reilly Media, USA, 2020; 276 p.
Kearney, M.W. R: Collecting and Analyzing Twitter Data. 2018. Available online: https://mkearney.github.io/nicar_tworkshop (accessed on 10 November 2021).
Marianna Lepelaar, Adam Wahby, Martha Rossouw, Linda Nikitin, Kanewa Tibble, Peter J. Ryan, Richard B. Watson Sentiment Analysis of Social Survey Data for Local City Councils. J. Sens. Actuator Netw. 2022, 11(1), 7; https://doi.org/10.3390/jsan11010007
Mavlutova, I., Babenko, V., Dykan, V., Prokopenko, N., Kalinichenko, S., Tokmakova, I. (2021). Business Restructuring as a Method of Strengtening Company’s Financial Position. Journal of Optimization in Industrial Engineering, 14(1), 129-139. http://dx.doi.org/10.22094/JOIE.2020.677839
Malyarets, L., Draskovic, M., Babenko, V., Kochuyeva, Z., Dorokhov, O. (2017). Theory and practice of controlling at enterprises in international business. Economic Annals-ХХI, Vol. 165, Iss. 5-6, 90-96. https://doi.org/10.21003/ea.V165-19
McKinne W. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition; Publisher: O'Reilly Media, USA, 2022; 579 p.
Molin S., Jee K. Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization, 2nd Edition; Publisher: Packt Publishing, UK, 2021; 788 p.
Naldi, M. A review of sentiment computation methods with R packages. arXiv Prepr. 2019, arXiv:1901.08319 2019. Available online: https://arxiv.org/pdf/1901.08319.pdf (accessed on 2 June 2021).
Nelli F. Python Data Analytics: With Pandas, NumPy, and Matplotlib 2nd ed. Edition; Publisher: Apress, USA, 2018; 588 p.
Plas J., Vander P. J. Python for Complex Tasks: Data Science and Machine Learning; Publisher: O'Reilly Bestsellers, USA, 2021; 576 p.
Ramazanov, S., Babenko, V., Honcharenko, O., Moisieieva, N., Dykan, V. (2020). Integrated intelligent information and analytical system of management of a life cycle of products of transport companies. Journal of Information Technology Management, 2020, 12(3), 26-33. https://doi.org/10.22059/jitm.2020.76291
Ramirez, C.M.; Abrajano, M.A.; Alvarez, R.M. Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data. Sci. Rep. 2019, 9, 16061.
White T. Hadoop: The Definitive Guide, Third Edition; Publisher: Yahoo Press, USA, 2012; 688 p.
Yigitcanlar, T.; Kankanamge, N.; Vella, K. How Are Smart City Concepts and Technologies Perceived and Utilized? A Systematic Geo-Twitter Analysis of Smart Cities in Australia? J. Urban Technol. 2021, 28, 135–154.