Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Document Type : Research Paper

Authors

1 Associate Prof., Department of Computer Science, Faculty of Information Technology, The Islamic University of Gaza, Palestine.

2 MSc, Department of Computer Science, Faculty of Information Technology, The Islamic University of Gaza, Gaza Strip, Palestine.

Abstract

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from external knowledge resources. However, these solutions are not well explored for the general web search in an open-domain setting. In addition, they mostly focus on supporting search in content expressed in English and Latin based languages. In this research, we propose a fully automated approach that aims to support exploratory search over the Arabic web content. It exploits the Arabic version of Wikipedia to extract complementary information that supports visual representation and deeper exploration of the search engine's results. Key Wikipedia entities are extracted from the text snippets produced by the search engine in response to the user's query. Entities are then filtered and ranked by using a novel ranking algorithm that extends the conventional PageRank algorithm. Finally, a graph is built and presented to the user to visually represent highly ranked topics and their relationships. The proposed approach was realized by developing ArabXplore, a system that integrates with the web browser to support the web search process by executing our approach in query time. It was assessed over a dataset of 100 Arabic search queries covering different domains, and results were assessed and rated by human subjects. The underlying ranking algorithm was also compared with the conventional PageRank.

Keywords


Abbache, A., Meziane, F., Belalem, G., & Belkredim, F. Z. (2018). Arabic query expansion using wordnet and association rules Information Retrieval and Management: Concepts, Methodologies, Tools, and Applications (pp. 1239-1254): IGI Global.
Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa: A fast and furious segmenter for arabic. Paper presented at the Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Demonstrations.
Agarwalla, L., Parikh, A., & Sai, A. P. V. (2018). Terms for query expansion using unstructured data: Google Patents.
Agichtein, E., Brill, E., & Dumais, S. (2006, August 06 - 10, 2006). Improving web search ranking by incorporating user behavior information. Paper presented at the Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, WA, USA.
Aletras, N., Baldwin, T., Lau, J. H., & Stevenson, M. (2014). Representing topics labels for exploring digital libraries. Paper presented at the Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries.
Alromima, W., Moawad, I. F., Elgohary, R., & Aref, M. (2016). Ontology-based query expansion for Arabic text retrieval. Int. J. Adv. Comput. Sci. Appl, 7(8), 223-230.
Amer, E., Khalil, H. M., & El-Shistawy, T. (2017). Enhancing Semantic Arabic Information Retrieval via Arabic Wikipedia Assisted Search Expansion Layer. Paper presented at the International Conference on Advanced Intelligent Systems and Informatics.
Apache. Apache Lucene. Retrieved 20-1-2020, 2020, from https://lucene.apache.org/
Azad, H. K., & Deepak, A. (2019a). A new approach for query expansion using Wikipedia and WordNet. Information Sciences, 492, 147-163.
Azad, H. K., & Deepak, A. (2019b). Query expansion techniques for information retrieval: a survey. Information Processing & Management, 56(5), 1698-1735.
Bouchoucha, A., Liu, X., & Nie, J.-Y. (2014). Integrating multiple resources for diversified query expansion. Paper presented at the European Conference on Information Retrieval.
Callender, P. M. a. J. (2010). Search Pattern: O'Reilly Media.
Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR), 44(1), 1.
Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., & MacKinnon, I. (2008, July 20 - 24, 2008). Novelty and diversity in information retrieval evaluation. Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.
Dahir, S., Khalifi, H., & El Qadi, A. (2019). Query Expansion Using DBpedia and WordNet. Paper presented at the Proceedings of the ArabWIC 6th Annual International Conference Research Track.
Dimitrova, V., Lau, L., Thakker, D., Yang-Turner, F., & Despotakis, D. (2013). Exploring exploratory search: a user study with linked semantic data. Paper presented at the Proceedings of the 2nd international workshop on intelligent exploration of semantic data.
Dou, Z., Jiang, Z., Hu, S., Wen, J.-R., & Song, R. (2015). Automatically mining facets for queries from their search results. IEEE Transactions on Knowledge and Data Engineering, 28(2), 385-397.
Gaona-García, P. A., Martin-Moncunill, D., & Montenegro-Marin, C. E. (2017). Trends and challenges of visual search interfaces in digital libraries and repositories. The Electronic Library, 35(1), 69-98.
Ge, M., Delgado-Battenfeld, C., & Jannach, D. (2010). Beyond accuracy: evaluating recommender systems by coverage and serendipity. Paper presented at the Proceedings of the fourth ACM conference on Recommender systems.
Green, S., & Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics.
Guisado-Gámez, J., Prat-Pérez, A., & Larriba-Pey, J. L. (2016). Query expansion via structural motifs in wikipedia graph. arXiv preprint arXiv:1602.07217.
Hisamitsu, T., & Niwa, Y. (2005). Word importance calculation method, document retrieving interface, word dictionary making method: Google Patents.
Jabri, S., Dahbi, A., Gadi, T., & Bassir, A. (2018). Improving Retrieval Performance Based on Query Expansion with Wikipedia and Text Mining Technique. Int. J. Intell. Eng. Syst, 11, 283-292.
Jacksi, K., Dimililer, N., & Zeebaree, S. (2015). A survey of exploratory search systems based on LOD resources. Paper presented at the Proc. 5th Int. Conf. Comput. Inform. ICOCI.
Jacomy, A. (2016). Sigma js. Retrieved 13, march, 2016, 2016, from http://sigmajs.org/
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.
Jiang, D., & Li, H. (2016). Context-aware query suggestion by mining log data: Google Patents.
Jiang, Z., Dou, Z., & Wen, J.-R. (2016). Generating query facets using knowledge bases. IEEE Transactions on Knowledge and Data Engineering, 29(2), 315-329.
Jumlesha, S., Sree, J. N. D., Likitha, T., & Goud, G. R. (2018). Dynamic Facet Ordering for Faceted Products Search Engines. International Journal of Research, 5(12), 4096-4099.
JWPL. Java Wikipedia Library. Retrieved 20-1-2020, 2020, from https://dkpro.github.io/dkpro-jwpl/
Kong, W., & Allan, J. (2014). Extending faceted search to the general web. Paper presented at the Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.
Krishnan, A., Deepak, P., Ranu, S., & Mehta, S. (2018). Leveraging semantic resources in diversified query expansion. World Wide Web, 21(4), 1041-1067.
Krishnan, A., Padmanabhan, D., Ranu, S., & Mehta, S. (2016). Select, link and rank: Diversified query expansion and entity ranking using wikipedia. Paper presented at the International conference on web information systems engineering.
Langville, A. N., & Meyer, C. D. (2011). Google's PageRank and beyond: The science of search engine rankings. Princeton, NJ, United States: Princeton University Press.
Lu, M., Sun, X., Wang, S., Lo, D., & Duan, Y. (2015). Query expansion via wordnet for effective code search. Paper presented at the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).
Marchionini, G. (2006). Exploratory search: from finding to understanding. Communications of the ACM, 49(4), 41-46.
Marie, N., & Gandon, F. (2014). Survey of linked data based exploration systems.
Raza, M. A., Mokhtar, R., & Ahmad, N. (2018). A survey of statistical approaches for query expansion. Knowledge and information systems, 1-25.
Raza, M. A., Mokhtar, R., Ahmad, N., Pasha, M., & Pasha, U. (2019). A Taxonomy and Survey of Semantic Approaches for Query Expansion. IEEE Access, 7, 17823-17833.
Raza, M. A., Mokhtar, R., Noraziah, A., Hamid, R. A., Zainuddin, F., & Ahmad, N. A. (2018). Query Expansion Using Conceptual Knowledge in Computer Science. Advanced Science Letters, 24(10), 7490-7493.
Selvaretnam, B., & Belkhatir, M. (2016). A linguistically driven framework for query expansion via grammatical constituent highlighting and role-based concept weighting. Information Processing & Management, 52(2), 174-192.
Tvarožek, M. (2011). Exploratory search in the adaptive social semantic web. Information Sciences and Technologies Bulletin of the ACM Slovakia, 3(1), 42-51.
Vandic, D., Aanen, S., Frasincar, F., & Kaymak, U. (2017). Dynamic facet ordering for faceted product search engines. IEEE Transactions on Knowledge and Data Engineering, 29(5), 1004-1016.
White, R. W., Kules, B., & Drucker, S. M. (2006). Supporting exploratory search, introduction, special issue, communications of the ACM. Communications of the ACM, 49(4), 36-39.
White, R. W., & Roth, R. A. (2009). Exploratory search: Beyond the query-response paradigm. Synthesis lectures on information concepts, retrieval, and services, 1(1), 1-98.
Xiong, C., & Callan, J. (2015). Query expansion with Freebase. Paper presented at the Proceedings of the 2015 international conference on the theory of information retrieval.
Yunzhi, C., Huijuan, L., Shapiro, L., Travillian, R. S., & Lanjuan, L. (2016). An approach to semantic query expansion system based on Hepatitis ontology. Journal of Biological Research-Thessaloniki, 23(1), 11.
Zhou, D., Wu, X., Zhao, W., Lawless, S., & Liu, J. (2017). Query expansion with enriched user profiles for personalized search utilizing folksonomy data. IEEE Transactions on Knowledge and Data Engineering, 29(7), 1536-1548.