Document Type : Research Paper
Authors
1 Mas., Department of Computer Engineering and Information technology, Razi University, Kermanshah, Iran,
2 Assistant Prof., Department of Computer Engineering and Information technology, Razi University, Kermanshah, Iran,
3 Assistant Prof., Department of Statistics and Information Technology, Institute of Judiciary, Tehran, Iran,
Abstract
Keywords
Main Subjects
Introduction
The last few decades have seen many positive socio-economic changes including remarkable developments in information dissemination. But still, all kinds of illegal activities occur daily, which make people pursue or defend their legal rights through judicial channels. Given the need for transparency in the judicial process, this is a field where almost every detail is meticulously recorded in text documents. This provides an opportunity for using intelligent automated data management and analysis tools, which could be especially useful in areas with demanding workloads (Yuriy, 2014). With the increasing use of digital documents in the legal domain, the sheer volume of textual content makes it increasingly difficult to manage legal information. However, it is not easy to meet the information needs of this field through ordinary document and record analyses, as this will require significant legal knowledge and a good understanding of legal terms. Therefore, to achieve cost and time-saving in this field, it is necessary to design a comprehensive framework or method for extracting and retrieving relevant legal data that meet specific conditions (Liu & Chen, 2017).
To issue a fair verdict, a judge must arbitrate impartially between the two parties. Because of the limited time and resources, it may become difficult for some judges to maintain an objective and impartial position when processing many cases. This poses two problems. First, judgments may become more subjective as it is difficult to accurately process the evidence of multiple cases (Wyner et al., 2010). Second, heavy workloads may put judges under severe pressure and stress, thereby reducing their ability to make quality judgments (Chou & Hsing, 2010). These two problems can be detected by predicting legal judgments and possible verdicts based on the details and outcomes of similar cases. Legal judgment prediction can also help litigants decide what actions to take before going to court and assist lawyers in advising their clients. A legal judgment prediction tool can also be turned into a publicly available online counseling service to inform people about the probable outcomes of litigations before filing a lawsuit, which may significantly reduce the number of cases and help poor people avoid the costs of hiring lawyers. While text mining is extensively used in a wide variety of fields, only a few studies have been performed on the text mining of legal documents (Șulea et al., 2017; Undavia et al., 2018).
Since judges decide cases based on not only their knowledge, experience, and personality, but also their sentiments (Conrad & Schilder, 2007), sentiment analysis can improve the accuracy of legal judgment predictions.
The subject of this paper is the prediction of judgments in Persian legal cases using text-mining techniques including Persian-specific text pre-processing, text vectorization, and sentiment and emotion analysis with the help of machine learning and deep learning methods. This is the first study to provide such a method for Persian cases as well as the first to use text mining techniques, emotion and sentiment analysis, and deep learning simultaneously in this field. The remainder of this article is as follows: In the second and third sections, Theoretical Foundations and Research Background are presented. In the fourth section, the proposed methods are discussed. Then, in the next section, the results of the implementation of the methods and data analysis are presented. and in the sixth section, the results are discussed.
Text mining is an automated or semi-automated text processing procedure for finding patterns and structures in texts and extracting relevant information from textual documents (Chen et al., 2013; Liu et al., 2015). Text mining is a rapidly developing field with several domains of application, including information retrieval and computational linguistics. In general, such natural language processing tools are created to extract knowledge from ontological data that are collected in various ways from different sources, including those written by humans. The importance of text mining techniques for the analysis of textual data stems from the fact that many textual data are poorly structured and cannot be analyzed without purpose-specific methods (Thamil Selvi & Laksmi, 2019). The text mining process consists of several steps, which tend to include the following:
Sentiment analysis is the process of assessing people’s sentiments and opinions in their comments on social networking sites and other textual documents. Most studies in this field are focused on the English language, and much fewer studies have been conducted about sentiment analysis in other languages including Persian. Persian is the official language of Iran, Tajikistan, and Afghanistan and is the first language of over 100 million people around the world. Persian is a challenging language for sentiment analysis and only a few tools have been developed for general Persian text processing. The processing of Persian documents and texts involves challenges such as misspellings, irregular word spacing, difficult stemming, and the presence of informal words (Basiri et al., 2014). In a legal case or record, the motive and thought process of the offender is described with several words or terms, some of which are sentimental or emotional. Examples of these sentimental terms include “intentional”, “cruel”, and “hateful” (Liu & Chen, 2017). In general, sentiments can affect all participants in the litigation process, including judges and juries (Eliot, 2020).
In the only study conducted so far on sentiment analysis in Chinese legal documents, the results have confirmed the positive effect of this analysis in improving the accuracy of legal judgment predictions (Liu & Chen, 2017). It should be noted that while this has used only one machine learning method, the present study makes use of several machine learning and deep learning methods for modeling.
Emotion analysis is a challenging task (Scherer, 2021). This is because as Scherer puts it, the definition of emotion could be very complex (Mayer et al., 2008). Different definitions of emotion can be reached through different methodological and conceptual approaches. However, most theorists agree that emotion is a set of expressive, behavioral, physiological, and phenomenological characteristics (Lewis et al., 2016).
Emotion is an unconscious phenomenon that can become conscious during evocation. It is a social representation of sentiment and feeling that is influenced by culture and has an intensity and value as a form of emotional state (Plutchik, 1991).
The concepts of emotion and sentiment are often used interchangeably. This is mostly because they both refer to experiences originating from the combined effects of biological, cognitive, and social causes. However, sentiments are distinguished from emotions by the length of time they are experienced (Kim & Klinger, 2018). Also, while sentiments tend to emerge towards a subject or situation, this is often not the case for emotions. For example, a person may wake up and be upset or happy for no reason.
Literature Review
This section provides a review of previously published articles on the application of text mining in legal and judicial documents
Table 1. List of published articles about legal judgment prediction
Model accuracy |
Dataset description |
Dataset language |
Method |
Author(s) |
79% |
600 cases of the European Court of Human Rights (ECHR) |
English |
SVM |
Aletras et al. (2016) |
80.62% |
1,208 Chinese criminal records were collected manually over about 2 months |
Chinese |
SVM with sentiment analysis |
Liu and Chen (2017) |
Case jurisdiction prediction: 96% Ruling prediction: 98% Ruling date prediction: 87% |
131830 legal documents of the French Supreme Court |
French |
SVM |
Șulea et al. (2017) |
72.4% |
8419 United States Supreme Court documents |
English |
W2V+CNN |
Undavia et al. (2018) |
100% |
167 cases of the Russian Economic Crimes Court |
Russian |
Artificial neural network |
Alekseev et al. (2019) |
98% |
1452 Arabic legal documents of the Moroccan courts in the field of real estate and traffic offenses |
Arabic |
SVM |
Ait Yahia and Chakir (2019) |
84.46% |
452 documents of the French Court of Appeals, extracted from data.gouv.fr |
French |
CNN with max pooling |
Hammami et al. (2019) |
88.41 |
using data from approximately 110,000 court judgments from Japan spanning the period 1998–2018 from the district to the supreme court level |
Japanese |
One-hot coding +XGB |
Ryoma Kondo et al. (2022) |
93.46 |
4252 documents on Canadian Legal Information Institute (CanLII) website |
English |
Custom W2V + RCNN |
Intisar Almuslim and Diana Inkpen (2022) |
Despite the extensive use of text mining in various fields, only a few studies have been performed about text mining in the legal domain, especially for court ruling prediction. It should be noted that this is the first study on the use of text mining and artificial intelligence in general in the field of legal judgment prediction for Persian cases.
Methodology
The goal of this study was to achieve the highest possible accuracy in legal judgment prediction. Therefore, in addition to the texts of court cases, two other characteristics, namely sentiments and emotions expressed in these texts were also incorporated into the prediction process. In this approach, to predict the outcome of a new case, the sentiment and emotion scores of previous cases are used in combination with the case texts in the legal judgment prediction model. The flowchart of the process of legal judgment prediction in this study is displayed in Figure 1.
Figure 1. Flowchart of the legal judgment prediction process
Statistical Population and Dataset
The data of this study comprised 6,000 criminal cases involving the purchase, possession, concealment, or transportation of 500 grams to 5 kilograms of illicit substances, which, according to the Iranian criminal code, are punishable by fines of 15-60 million rials, 40-74 lashes, and 2-5 years of imprisonment. Legal court documents usually consist of multiple sections dedicated to the case number, identity information of litigants (plaintiffs, defendants, and their lawyers), case subject, main ruling, facts, and evidence, statements of litigants, cited legal laws and articles, ruling date, the name of the judge, and documents and evidence concerning the criminal motivation, process, and purpose.
Figure 2. Histogram of the length of the cases in the dataset (in words).
As shown in Figure 2, the average length of cases considered in this study was about 2,000 words.
Pre-processing of Persian Legal Documents
The purpose of text pre-processing is to remove redundant and unimportant words from the text to prepare it for modeling. Because of the limited number of tools that support the Persian language, it is more difficult and complex to pre-process Persian texts than English documents. In this study, the pre-processing of legal documents was performed in ten steps. In the first step, tokenization is used which is the process of dividing a text into words or tokens. In the second step, all punctuation marks (e.g., question marks, commas, parentheses, exclamation marks) were removed. In the third step, all English letters and in the fourth step, all numbers in the documents were removed. In the fifth step, all Arabic characters were transformed into Persian characters, such as کَ and ی which were converted to ک and ی, respectively. In the sixth step, stop words (e.g., conjunctions, pronouns, and propositions) were removed. After correcting the misspelled words in the seventh step, stemming and lemmatization methods were used to transform all words into their root form in the eighth and ninth steps. Considering that Lemmatization takes into account the morphological structure of words and uses a dictionary or a knowledge base to map each word to its root, it is naturally more accurate. However, due to the creation of new words in the Persian language, if a word is not in the knowledge base, stemming can be helpful in regular situations to convert words to their roots. Therefore, in this research, both techniques are used simultaneously. Finally, in the tenth step of pre-processing, the text was normalized. After these ten steps, the Persian judicial texts were ready for analysis and modeling.
Assignment of Sentence Labels to Court Cases
As mentioned, the cases considered in this study were 6,000 criminal cases involving the purchase, possession, concealment, or transportation of 500 grams to 5 kilograms of illicit substances, legal punishments include fines of 15-60 million rials, 40-74 lashes, and 2-5 years of imprisonment. In this study, we defined three labels for sentences called “Fines”, “Flogging” and “Imprisonment”, each with two classes called “Light” and “Heavy”, which were assigned based on whether the sentence has been on the heavy side or light side of lawful punishment for the crime committed.
For the “Fines” label, the sentence was considered “Light” if it was between 15 and 37.5 million rials and “Heavy” if it was between 37.5 and 60 million rials. Out of 6000 cases reviewed in this study, 3412 cases were labeled Light, and 2588 were labeled Heavy. Figure 3A shows the percentage of light and heavy fine sentences in the reviewed cases.
For the “Flogging” label, the sentence was considered “Light” if it was between 40 and 57 lashes and “Heavy” if it was between 57 and 74 lashes. Out of 6000 cases of this study, 4035 cases were labeled Light, and 1965 were labeled Heavy. The percentages of light and heavy flogging sentences in the reviewed cases are shown in Figure 3B.
For the “Imprisonment” label, the sentence was considered “Light” if it was between 2 and 3.5 years and “Heavy” if it was between 3.5 and 5 years. Out of 6000 cases considered in this study, 3720 were labeled Light, and 2280 were labeled Heavy. The percentages of light and heavy imprisonment sentences in the cases are shown in Figure 3C.
|
|
B |
A |
|
|
C |
|
Figure 3. Percentage of light and heavy sentences in the reviewed cases: A) fines, B) flogging, and C) imprisonment. |
Table 2 shows the number of cases by the type of sentences given and their intensity (whether they have been heavy or light).
Table 2. Number of cases by the assigned label (sentence type and intensity).
Intensity Sentence |
Light |
Heavy |
Fines |
3412 |
2588 |
Flogging |
4035 |
1965 |
Imprisonment |
3720 |
2280 |
Sentiment Scoring of Court Cases
In this study, sentiment scoring was performed using the NRC emotion lexicon. This lexicon consists of over 14,000 English words, but this study used its Persian translation. The lexicon covers eight types of emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The sentiment label interpretation for NRC was performed manually by crowdsourcing (Mohammad & Turney, 2010, 2013).
In the first step, the texts of the statements of the litigants in the court cases were tokenized. Then, all the words in the statements of each case were matched with those words in the NRC lexicon that convey positive or negative sentiments, and the total positive and negative sentiment scores of each case were calculated. The negative sentiment score was then subtracted from the positive score. The sentiment variable was set to 1 if the answer was greater than zero (indicating positive sentiment) and was set to 0 otherwise (indicating negative sentiment). As shown in Figure 4, ultimately, out of 6,000 drug-related cases, 2,135 had positive sentiments and 3,865 had negative sentiments.
Figure 4. The number of drug-related criminal cases with positive and negative sentiments
Emotion Scoring of Court Cases
Emotion scores of the cases were also determined using the NRC emotion lexicon, which as mentioned in Section 2.4, has eight types of emotion (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust). To analyze emotions, we first tokenized the texts of the statements of the litigants and then matched their words with the words in the NRC lexicon that convey these eight types of emotions. Then, we calculated the total score of each type of emotion for each case and divided this score by the total word count of that case (this was because cases vary in length and longer cases have a higher probability of obtaining higher emotion scores). Considering the presence of outlier (extremely high) scores for some emotions in some cases, which could be problematic for giving emotion variable binary values based on the average of scores in all cases (i.e. setting the emotion variable of a case to 1 if its score is higher than the total average and 0 if its score is lower than the total average), we had to remove the skewness to a great extent. Thus, we used the Shewhart statistical method to find and remove outliers. This method involves finding an upper control limit (UCL) and a lower control limit (LCL) and removing all values that are above UCL or below LCL. For further explanation, first, the score of each emotion in the texts was calculated based on a match with the emotion dictionary. Then, the emotion score was divided by the length of the words in the text, taking into account that longer texts with more emotional words should not have higher emotional scores than shorter texts. Additionally, the Shewhart method was used to remove scores of texts that were above or below the threshold for averaging. Finally, averaging was performed, and after the aforementioned normalization, texts with emotional scores higher than the average were categorized according to their respective emotions. The Shewhart charts for the scores of each emotion are shown separately in Figure 5.
Figure 5. Shewhart chart for the scores of each emotion
It should be noted we did not completely discard the scores that were higher than UCL or lower than LCL. Rather, we just did not include these scores in averaging. For example, if the surprise emotion score of a case was higher than UCL or lower than LCL, this score was not included in the averaging of surprise emotion scores of all cases, but the surprise emotion variable of the case was nevertheless set to 1 if it was higher than the total average or 0 if was lower than the total average.
Figure 6 shows the number of cases containing each emotion. Each case file can contain multiple emotions or no emotions at all.
Figure 6. The number of cases with each type of emotion.
Text Vectorization and Feature Extraction
In tabular data, each column of the table is a feature of the data. But in textual data, features should be identified and extracted by text vectorization, which involves transforming the text into numbers. This can be done by various methods. In this study, TF-IDF and Word2Vec methods including Continuous Bag of Words, Skip-gram, Fast Text, and BERT were used for this purpose.
Modeling and Classification of Legal Judgments
As in Section 4.1. It was mentioned that in this research, judicial cases were investigated which include the purchase, possession, concealment, or transportation of 500 grams to 5 kilograms of drugs, which, according to the Iranian criminal code, are punishable by fines of 15-60 million rials, 40 - 74 lashes, and 2-5 years of imprisonment. Therefore, the punishment determined for this research should have a value within the specified range in the law for all three types of punishment, and in this research, models were created to predict the severity of punishment using machine learning algorithms. For this purpose, three binary classifications were created for each punishment using different machine learning algorithms.
Thus, after text vectorization, classification, and modeling were performed with combinations of eleven machine learning methods, namely Support Vector Machine(SVM), Artificial Neural Network(ANN), decision tree, Naive Bayes(NB), Logistic Regression and K-Nearest-Neighbor (KNN), Random Forest, Adaboost, Long Short-Term Memory (LSTM), Convolutional Neural Network(CNN), and Bidirectional Encoder Representations from Transformers(BERT).
Results
In this section, we evaluate the accuracy of different algorithms in predicting legal judgments to determine which offers the highest accuracy for each sentence label (i.e. fines, flogging, and imprisonment). To examine the effect of sentiment and emotion scores on legal judgment prediction, in the next section, sentiment and emotion scores of the considered cases are also included in the prediction process.
Table 3 shows the accuracy of different machine learning methods in predicting each type of sentence (fines, flogging, and imprisonment).
Table 3. Accuracy of different methods in predicting sentence labels (fines, flogging, and imprisonment).
Accuracy in predicting imprisonment |
Accuracy in predicting fines |
Accuracy in predicting flogging |
Algorithm |
Method of Persian text vectorization |
Classification method and data type |
87.13 |
85.68 |
87.80 |
SVM |
TF-IDF |
Three binary Classification for the prediction of legal judgments with three labels of fines, flogging, and imprisonment in two classes of heavy and light for 6000 court cases related to the purchase, possession, concealment, or transportation of illicit substances |
86.11 |
85.60 |
85.52 |
ANN |
TF-IDF |
|
89.01 |
83.33 |
85.5 |
Decision tree |
TF-IDF |
|
83.87 |
85.61 |
81 |
Naive Bayes |
TF-IDF |
|
86.86 |
85.69 |
84.02 |
Logistic regression |
TF-IDF |
|
82.36 |
78.78 |
77.20 |
KNN |
TF-IDF |
|
86.89 |
86.36 |
86.53 |
Random forest |
TF-IDF |
|
90.91 |
89.39 |
88.90 |
Adaboost |
TF-IDF |
|
86.91 |
86.02 |
83.07 |
LSTM |
Word2Vec(CBOW) |
|
89.5 |
86.39 |
82.05 |
LSTM and Dropout |
Word2Vec(CBOW) |
|
89.47 |
87.11 |
85.75 |
LSTM and CNN |
Word2Vec(CBOW) |
|
88.17 |
86.32 |
84.17 |
LSTM |
Word2Vec(Skip-gram) |
|
90.67 |
86.70 |
82.50 |
LSTM + Dropout |
Word2Vec(Skip-gram) |
|
91.12 |
88.33 |
86.65 |
LSTM + CNN |
Word2Vec(Skip-gram) |
|
87.33 |
84.17 |
82.23 |
LSTM |
Fast Text |
|
86.5 |
85.80 |
85.57 |
LSTM + Dropout |
Fast Text |
|
85.60 |
85 |
84.73 |
LSTM + CNN |
Fast Text |
|
90.64 |
90.06 |
85.41 |
BERT |
BERT |
Results for the Prediction of Flogging Sentences
Figure 7 shows the accuracy of different machine learning methods in predicting flogging sentences in the considered cases (the methods are arranged in order of accuracy).
Figure 7. Accuracy of different methods in predicting flogging sentences in the studied legal cases
As shown in Figure 7, the top three most accurate methods in predicting flogging were TFIDF+Adaboost with 88.90% accuracy, TFIDF+SVM with 87.80% accuracy, and Skipgram+ LSTM+CNN with 86.65% accuracy, and the three least accurate methods for this sentence were TFIDF+KNN with 77.20% accuracy, TFIDF+Naive Bayes with 81% accuracy, and CBOW+LSTM+Dropout with 82.05% accuracy. Because there are three different algorithms with the highest accuracy for three types of punishments, to predict the sentence of each case, one must use these three algorithms. However, for each type of punishment, one should rely on the algorithm that has the highest accuracy for that specific punishment
Results for the Prediction of Fine Sentences
The accuracy of the tested machine learning methods in predicting fine sentences in the considered cases is illustrated in Figure 8 (the methods are arranged in order of accuracy).
Figure 8. Accuracy of different methods in predicting fine sentences in the studied legal cases.
As Figure 8 demonstrates, the top three most accurate methods in predicting fine sentences were BERT with 90.06% accuracy, TFIDF+Adaboost with 89.39% accuracy, and Skipgram+LSTM+CNN with 88.33% accuracy, and the three least accurate methods for this sentence were
TFIDF+KNN with 78.78% accuracy, TFIDF+DecisionTree with 83.33% accuracy, and FastText+LSTM with 84.17% accuracy.
Results for the Prediction of Imprisonment Sentences
The chart plotted in Figure 9 shows the accuracy of different machine learning methods in predicting imprisonment sentences in the considered cases (The methods are arranged in order of accuracy).
Figure 9. Accuracy of different methods in predicting imprisonment sentences in the studied legal cases
As shown in Figure 9, the top three most accurate methods in predicting imprisonment were Skipgram+LSTM+CNN with 91.12% accuracy, TFIDF+Adaboost with 90.91% accuracy, and Skipgram+LSTM+Dropout with 90.67% accuracy, and the three least accurate methods for this sentence were TFIDF+KNN with 82.36% accuracy, TFIDF+Naive Bayes with 83.87% accuracy, and FastText+LSTM+CNN with 85.60% accuracy.
Investigation of the Effect of Sentiment and Emotion Scores on the Accuracy of Legal Judgment Prediction
After determining the accuracy of the methods (Table 3), we identified the algorithm that offers the highest accuracy for each sentence label (fines, flogging, and imprisonment) and determined its accuracy in legal judgment prediction under the following conditions.
1) When the input data are the judicial texts plus the sentiment scores of cases.
2) When the input data are the judicial texts plus the emotion scores of cases.
3) When the input data are the judicial texts plus both sentiment and emotion scores of cases
Table 4 shows the accuracy of the algorithms that were most accurate in predicting each sentence label when sentiment and emotion scores of the considered cases were also incorporated into the prediction process.
Table 4. Effect of sentiment scores and emotion scores on the accuracy of legal judgment prediction
Accuracy |
Data |
The algorithm with the highest accuracy in predicting the sentence based on the texts of the case file |
||
Emotion score |
Sentiment score |
Case text |
||
88.90 |
|
|
ü |
TFIDF+Adaboost (the algorithm with the highest accuracy in predicting flogging sentences) |
90.12 |
|
ü |
ü |
TFIDF+Adaboost (the algorithm with the highest accuracy in predicting flogging sentences) |
89.36 |
ü |
|
ü |
TFIDF+Adaboost (the algorithm with the highest accuracy in predicting flogging sentences) |
91.69 |
ü |
ü |
ü |
TFIDF+Adaboost (the algorithm with the highest accuracy in predicting flogging sentences) |
90.06 |
|
|
ü |
BERT (the algorithm with the highest accuracy in predicting fine sentences) |
93.13 |
|
ü |
ü |
BERT (the algorithm with the highest accuracy in predicting fine sentences) |
92.02 |
ü |
|
ü |
BERT (the algorithm with the highest accuracy in predicting fine sentences) |
93.49 |
ü |
ü |
ü |
BERT (the algorithm with the highest accuracy in predicting fine sentences) |
91.12 |
|
|
ü |
Skipgram+LSTM+CNN (the algorithm with the highest accuracy in predicting imprisonment sentences) |
91.20 |
|
ü |
ü |
Skipgram+LSTM+CNN (the algorithm with the highest accuracy in predicting imprisonment sentences) |
91.49 |
ü |
|
ü |
Skipgram+LSTM+CNN (the algorithm with the highest accuracy in predicting imprisonment sentences) |
91.52 |
ü |
ü |
ü |
Skipgram+LSTM+CNN (the algorithm with the highest accuracy in predicting imprisonment sentences) |
As shown in Table 4, the most accurate algorithm in predicting flogging sentences based exclusively on case texts was TFIDF+Adaboost with 88.90% accuracy, which increased to 90.12% (1.22% improvement) when sentiment scores were added to the dataset and to 89.36% (0.46% improvement) when emotion scores were incorporated into the prediction process. Adding both sentiment and emotion scores to the input data increased the accuracy of TFIDF+Adaboost in predicting flogging sentences to 91.69%, which is a 2.79% improvement.
The most accurate algorithm in predicting fine sentences based on case texts was BERT with 90.06% accuracy. The accuracy of this method increased to 93.13% (3.07% improvement) when sentiment scores were introduced to the dataset and to 92.02% (1.96% improvement) when emotion scores were added to the dataset. Once sentiment and emotion scores were both incorporated into the prediction process, the accuracy of this method in predicting fine sentences increased to 93.49%, which is a 3.43% improvement.
Finally, the most accurate algorithm in predicting imprisonment sentences based only on case texts was Skipgram+LSTM+CNN with 91.12% accuracy, which increased to 91.20% (0.08% improvement) when sentiment scores were also used in the prediction process and to 91.49% (0.37% improvement) when emotion scores were added to the dataset. Introducing both sentiment and emotion scores to the dataset raised the accuracy of Skipgram+LSTM+CNN in predicting imprisonment sentences to 91.52%, which is a 0.4% improvement.
Conclusion
The subject of this article was the prediction of legal judgments in criminal cases related to the purchase, possession, concealment, or transportation of illicit drugs using text mining, machine learning, and deep learning methods and the impact of sentiments and emotions in the texts of case files and documents on the prediction of the type (flogging, fines, and imprisonment) and intensity (heavy, light) of sentences. For this purpose, we first pre-processed the textual documents of 6000 Persian drug-related court cases and then used a translation of the NRC emotion and sentiment lexicon to give each case a positive or negative sentiment score and an emotion score based on scores given for eight types of emotions. Next, we classified the sentences into two classes of light and heavy, and modeled them with a variety of machine-learning methods.
The methods with the highest accuracy in predicting flogging, fine, and imprisonment sentences based exclusively on case texts were found to be BERT, TFIDF+Adaboost, and Skipgram+LSTM+CNN, respectively. Finally, the accuracy of these algorithms (algorithms that were most accurate for each type of sentence based only on case texts) under the conditions where sentiment scores, emotion scores, or both were incorporated into the prediction process was investigated. The results of this investigation showed that the use of sentiment and emotion scores improved the accuracy of legal judgment prediction for all three-sentence types (flogging, fines, and imprisonment). Among the three sentence types, predictions for a flogging were most greatly affected by sentiment and emotion scores, and predictions for imprisonment were least affected. Overall, sentiment had a greater effect on the accuracy of legal judgment predictions than emotion, although the opposite was true for imprisonment sentences.
Conflict of interest
The authors declare no potential conflict of interest regarding the publication of this work. In addition, the ethical issues including plagiarism, informed consent, misconduct, data fabrication and, or falsification, double publication and, or submission, and redundancy have been completely witnessed by the authors.
Funding
The author (s) received no financial support for the research, authorship, and/or publication of this article.