Improving Text Mining Methods in Market Prediction via Prototype Selection Algorithms

Document Type: Research Paper


1 MSc. Student, Department of Computer Engineering, Faculty Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

2 Assistant Prof., Department of Computer Engineering, Faculty Engineering, Shahid Bahonar University of Kerman, Kerman, Iran


Nowadays, researches are faced with large volumes of data. Since a considerable amount of them are unstructured, they cannot be processed naturally. Hence two main challenges in this field are high dimensional of features space and bulk of available data. In this research, a feature selection method based on target features is propose to handle the curse of dimensionality. Moreover, to address the huge volume of data some of prototype selection approaches are utilized. The proposed method in this paper has three essential steps that each step improves the previous ones. Although, the proposed method reached significant results in each phase separately, its best performance obtained via the last phase in terms of classification accuracy rate. To evaluate the performance of the proposed method, it has been compared with three-layer algorithm. The results revealed that the proposed method had significantly better results than the three-layer algorithm in average.


Main Subjects

Aggarwal, C. C. & Zhai, C. (2012). Mining text data. Springer Science & Business Media. ISBN: 978-1-4614-3222-7 (Print) 978-1-4614-3223-4. (Online)
Aghabozorgi, S. (2016). Big Data Mining. Retrieved January 09, 2016, from
de Fortuny, E. J., De Smedt, T., Martens, D. & Daelemans, W. (2014). Evaluating and understanding text-based stock price prediction models. Information Processing & Management, 50(2): 426-441.
Garcia, S., Derrac, J., Cano, J. R., & Herrera, F. (2012). Prototype selection for nearest neighbor classification: Taxonomy and empirical study. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(3), 417-435.
García, S., Luengo, J. & Herrera, F. (2015). Data preprocessing in data mining. Switzerland: Springer.
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3): 685-697.
Huang, C.-J., Liao, J.-J., Yang, D.-X., Chang, T.-Y., & Luo, Y.-C. (2010). Realization of a news dissemination agent based on weighted association rules and text mining techniques. Expert Systems with Applications, 37(9): 6409-6413.
Im, T. L., San, P. W., On, C. K., Alfred, R., & Anthony, P. (2014). Impact of Financial News Headline and Content to Market Sentiment. International Journal of Machine Learning and Computing, 4(3): 237-242.
Jivani, A. G. (2011). A comparative study of stemming algorithms. Int. J. Computer Technology and Applications, 2(6): 1930-1938.
Kim, Y., Jeong, S. R., & Ghani, I. (2014). Text opinion mining to analyze news for stock market prediction. Int. J. Advances in Soft Computing and its Applications, 6(1): 1-13.
Manning, C. D., Raghavan, P. & Schütze, H. (2008). Introduction to information retrieval (Vol. 1): Cambridge university press Cambridge.
Murty, M. N. & Devi, V. S. (2011). Pattern recognition: An algorithmic approach. India : Springer Science & Business Media.
Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y. & Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16): 7653-7670.
Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y. & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment. Expert Systems with Applications, 42(1): 306-324.
Nikfarjam, A., Emadzadeh, E. & Muthaiyah, S. (2010). Text mining approaches for stock market prediction. Paper presented at the Computer and Automation Engineering (ICCAE). 2010 The 2nd International Conference on. 26-28 Feb. 2010: Singapore
Niknam, F. & Niknafs, A. (2015). Using Training Set Selection Methods to Improve Text Mining on Market Prediction via News Headlines. Paper presented at the The International Congress on Technology, Communication and Knowledge, Mashhad, Iran. (in Persian)
Passini, C., Luiza, M., Estébanez, K., Figueredo, G., Ebecken, F. & Nelson, F. (2013). A strategy for training set selection in text classification problems. International Journal of Advanced Computer Science & Applications, 4(6): 54-60.
Weiss, S. M., Indurkhya, N. & Zhang, T. (2010). Fundamentals of predictive text mining: Springer Science & Business Media.
Yang, J., Liu, Y., Zhu, X., Liu, Z., & Zhang, X. (2012). A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Information Processing & Management, 48(4): 741-754.