abnormal data detection and learning their behavior by abnormality and satisficing theory

Document Type: Research Paper

Authors

Abstract

Learning of abnormalities is a considerable challenge in data mining and knowledge discovery. Exceptional phenomena detect among huge records of the database which contains a large number of normal records and very few abnormal ones. This is important to promote confidence to a limited number of records for effective learning of abnormality. In this study, a new approach based on the abnormality theory and satisficing theory presented for confidence improvement of abnormal data detection and learning. First, the borders of abnormal and normal behavior clear using a combination approach based on abnormality theory then, satisfied solution extracted by means of satisficing theory. Modified RISE method as a bottom-up learning approach implemented to extract Normal and abnormal knowledge. The efficiency of the proposed model determined by using it, for abnormal stock selection from the Iran stock market. The superior of the proposed method results toward the results of applying decision tree and support vector machine is considerable. Accuracy of proposed method measure by g-means index. The results show the capability of proposed approach in abnormality detection and learning.

Keywords

Main Subjects


Alavije, M.R., Askari, SH. & Paraste, S. (2015). Intelligent Online Store: User Behavior Analysis based Recommender System. Journal of Information Technology Management, 7(2): 385-406. (in Persian)

Albanis, G. & Batchelor, R. (2007). Combining heterogeneous classifiers for stock selection. Intelligent Systems in Accounting, Finance and Management, 15 (1-2): 1-27.

Barber, B.M. & Lyon, J.D. (1997). Detecting long-run abnormal stock returns: The empirical power and specification of test statistics. Journal of Financial Economics, 43(3): 341-372.

Boshes, J. (2009). Change point detection in cyber attack data. PHD theses, Arizona state university.

Burez, J. & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3): 4626-4636.

Califf, M.E. & Mooney, R.J. (2003).  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction. Journal of Machine Learning Research, 4:177-210.

Cao, L., Zhao, Y. & Zhang, C. (2008). Mining Impact-Targeted Activity Patternsin Imbalanced Data. IEEE Transactions on knowledge and data engineering, 20(8): 1053 – 1066.

Chang, T. S. (2011). A comparative study of artificial neural networks, and decision trees for digital game content stocks price prediction. Expert Systems with Applications, 38 (12): 14846–14851.

Chawla, N.V., Japkowicz, N. & lcz, A.K. (2004). Editorial: Special Issue on Learning from Imbalanced Data Sets. Sigkdd Explorations, 6(1):1–6.

Chen, M. C., Chen, L. S., Hsu, C.C. & Zeng, W.R. (2008). An information granulation based data mining approach for classifying imbalanced data. Information Sciences, 178(16): 3214-3227.

Chen, S. (2011). Adaptive Learning from data flow and imbalanced data. PhD Thesis, Stevens institute of technology, Hoboken.

Clark, E. (2014). Exploiting stochastic dominance to generate abnormal stock returns. Journal of Financial Markets, 20(1): 20-38.

Conlisk, J. (1996). Why bounded rationality? Journal of Economic Literature, 34(2): 669-694.

Fellner, G., Guth, W. & Martin, E. (2006). Satisficing or Optimizing? An Experimental Study, Max-Planck-Institute for Ökonomik. Papers on Strategic Interaction, www.econbiz.de/Record/satisficing-or-optimizing-an-experimental-study-fellner-gerlinde/10004870567.

Galván, J.R., Elices, A., Muñoz, A., Czernichow, T. & Sanz-Bobi, M.A. (1998). System for detection of abnormalities and fraud in customer consumption. 12th Conference on the Electric Power Supply Industry, Thailand.

García, V., Sánchez, J.S. & Mollineda, R.A. (2012). On the effectiveness of  preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1): 13-21.

Gigerenzer, G. (2010). Moral Satisficing: Rethinking Moral Behavior as Bounded Rationality. Topics in Cognitive Science, 2(3): 528-554.

Gong, R.S. (2010). A Segmentation and Re-balancing Approach for Classification of Imbalanced Data. PHD theses, University of Cincinnati.

Hu, D. H., Zhang, X. X., Yin, J., Zheng, V. W. & Yang Q. (2009). Abnormal Activity Recognition Based on HDP-HMM Models. the Twenty-First International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.

Huang, K. Y. & Jane, C. J. (2009). A hybrid model for stock market forecasting and portfolio selection based on ARX, grey system and RS theories. Expert Systems with Applications, 36 (3): 5387-5392.

Joshi, M.V. (2002). Learning Classifier Models for Predicting Rare Phenomena. PhD thesis, University of Minnesota, USA.

Kershaw, D. (2010). Anomaly-based network intrusion detection using outlier subspace analysis approach. PhD Thesis, Dalhousie University Halifax.

Kim, Y. & Sohn, S.Y. (2012). Stock fraud detection using peer group analysis. Expert Systems with Applications, 39(10): 8986–8992.

Kou, Y. (2006). Abnormal Pattern Recognition in Spatial Data. PHD theses, Faculty of Virginia Polytechnic Institute and State University.

McCarthy, J. (1986). Applications of circumscription to formalizing common-sense knowledge. Artificial Intelligence, 28(1): 89-116.

Mohaghar, A., Lucas, C., Hoseini, F. & Monshi, A. A. (2009). Use of business intelligence as a strategic information technology in banking: fraud discovery & detection. Journal of Information Technology Management, 1(1): 105-120. (in Persian)

Olson, D. & Mossman, C. (2003). Neural network forecasts of Canadian stock returns using accounting ratios. International Journal of Forecasting, 19(3): 453-465.

Reiter, R. (1987). A Theory of Diagnosis from First Principles. Artificial Intelligence, 32(1): 57-95.

Schwartz, B., Ward, A. Monterosso, J., Lyubomirsky, S., White, K. & Lehman, D. (2002). Maximizing Versus Satisficing: Happiness is a Matter of Choice. Journal of Personality andSocial Psychology, 83 (5): 1178-1197.

Simon, H.A. (1997). Models of boundedrationality, Empirically grounded economic reason. Vol. 3. Cambridge, Mass. MIT Press.

Slote, M. (1989). Beyond Optimizing: A Study of Rational Choice, Cambridge, Mass. Harvard University Press.

Taghavifard, M.T. & Jafari, Z. (2015). Fraud Detection Using a Fuzzy Expert System In Motor Insurance. Journal of IT Management, 7(2): 239-258.
(in Persian)

Tezel, S. K. & Latecki, L .J. (2009). Improving SVM Classification on Imbalanced Data Sets in Distance Spaces. Ninth IEEE International Conference on. 6-9 Dec. DOI:10.1109/ICDM.2009.59.

Vosough, M., Taghavifard, M.T. & Alborzi, M. (2015). Bank card fraud detection using artificial neural network. Journal of Information Technology Management, 6(4): 105-120. (in Persian)

Ward, D. (1992). The role of satisficing in foraging theory. Oikos, 63 (2): 312–317.

Weiss, G. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, 6(1):7–19.

Xiang, T. & Gong, S. (2008). Video Behavior Profiling for Anomaly Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(5): 893–908.

Zhang, J. & Mani, I. (2003). KNN approach to unbalanced data distributions: A case study involving information extraction. ICML' 2003 Workshop on Learning from Imbalanced Datasets.

Zhu, P. & Hu, Q. (2013). Rule extraction from support vector machines based on consistent region covering reduction. Knowledge-Based Systems, 42(1): 1-8.