Breast Cancer Detection based on 3-D Mammography Images using Deep Learning Strategies

Document Type : Research Paper


1 Department of ECE, Karunya Institute of Technology and Sciences, Coimbatore - 641114, India

2 College of Information Technology, University of Fujaiah, UAE.

3 Department of Electronics Engineering, Madras Institute of Technology, Anna University, Chennai.

4 Faculty of Computer & Artificial Intelligence, Beni-Suef University, Beni-Suef City, 62511, Egypt; College of Computer Information Technology, American University in the Emirates, United Arab Emirates.


In recent scenario, women are suffering from breast cancer disease across the world. Mammography is one of the important methods to detect breast cancer early; that to reduce the cost and workload of radiologists. Medical image processing is a tremendous technique used to determine the disease in advance to reduce the risk factor. To predict the disease from 2-D mammography images for diagnosing and detecting based on advanced soft computing paradigm. Still, to get more accuracy in all coordinate axes, 3-D mammography imaging is used to capture depth information from all different angles. After the reconstruction of this process, a better quality of 3D mammography is obtained. It is useful for the experts to identify the disease in well advance. To improve the accuracy of disease findings, deep convolution neural networks (CNN) can be applied for automatic feature learning, and classifier building. This work also presents a comparison of the other state of art methods used in the last decades.


Over the past few decades, medical imaging techniques, such as computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), mammography, ultrasound, and X-ray, have been used for the early detection, diagnosis, and treatment of diseases. In the clinic, medical image analysis has been performed mostly by human experts such as radiologists and physicians. However, given wide variations in pathology. Researchers and doctors have begun to benefit from the computer-assisted program. Although the rate of progress in computational medical image analysis has not been as rapid as that in medical imaging technologies, the situation is improving with the computer-assessed technique (Hayit Greenspan et. al., 2016).

Deep learning is a type of machine learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep learning is usually implemented using a neural network architecture. The term “deep” refers to the number of layers in the network—the more layers, the deeper the network. Traditional neural networks contain only 2 or 3 layers, while deep networks can have hundreds as shown in Figure 1. Deep learning is an improvement of artificial neural networks, consisting of more layers that help to get accurate results by (Becker, Anton S et. al., 2017). Deep learning is the leading machine-learning tool in the general imaging and computer vision domains. In particular, convolutional neural networks (CNNs) have proven to be powerful tools for a broad range of computer vision tasks. Deep CNNs automatically learn mid-level and high-level abstractions obtained from raw data (e.g., images) as shown in Figure 2. CNNs are extremely effective in object recognition and localization of images. Medical image analysis groups across the world are quickly entering the field and applying CNNs and other deep learning methodologies to a wide variety of applications (Geert Litjens et. al., 2016).


Figure 1. Neural Networks Architecture



Figure 2. Framework of the breast cancer detection

Most common cancer among women worldwide is breast cancer. There is always need of advancement when it comes to medical imaging. Early detection of cancer followed by the proper treatment can reduce the risk of deaths. Machine learning can help medical professionals to diagnose the disease with more accuracy. Where deep learning or neural networks is one of the techniques which can be used for the classification of normal and abnormal breast detection. CNN can be used for this detection. Mammograms-MIAS dataset as shown in Figure 3, is used for this purpose having 322 mammograms in which almost 189 images are of normal and 133 are of abnormal breasts (Saira Charan et. al., 2018).

A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for deep learning with images and video. Like other neural networks, a CNN is composed of an input layer, an output layer, and many hidden layers in between. Feature Detection Layers These layers perform one of three types of operations on the data: convolution, pooling, or rectified linear unit (ReLU) as shown in Figure 4. Convolution puts the input images through a set of convolutional filters, each of which activates certain features from the images. Pooling simplifies the output by performing nonlinear downsampling, reducing the number of parameters that the network needs to learn about. Rectified linear unit (ReLU) allows for faster and more effective training by mapping negative values to zero and maintaining positive values.

Figure 3: Sample input mammogrpah image


Figure 4. Convolution neural networks

These three operations are repeated over tens or hundreds of layers, with each layer learning to detect different features. The segmentation stage aims to identify the breast regions with greater possibility of being masses.


Literature Review

  1. Breast cancer related with CNN

Previously, Sharf observed that in an online breast most cancers group, topics regarding simple classifications or definitions of tumours’ and diagnosis are most prevalent, indicating that Internet support was especially a complementary source of facts in the early years. A range of themes such as relationship/family issues became popular in online peer discussions according to subsequent research performed greater recently, however, disorder-specific topics like treatment, diagnosis, and interpretation of lab test results are nonetheless most prevalent. Specific subjects of discussion were recognized as well. For example, primarily based on content analysis, Meier and colleagues observed that the most frequent matters in 10cancer mailing lists have been about cure facts and how to communicate with healthcare providers. Owen and colleagues proposed a topic schema that consists of seven categories: outcome of cancer treatment, disorder reputation and strategies associated with cancer, healthcare amenities, and personnel, medical test and procedures, most cancers treatment, physical signs and side effects, and description of cancer in the body. Based on such schema, occurrence of different matters can be quantified to facilitate the content evaluation of most cancer aid groups. More recently, relying on quantitative methods, theme modeling is carried out for public OHCs, however in an unsupervised fashion.

  1. Types of classification

Since a given sentence in a post can be depicted concurring to multiple points (e.g., a sentence can be about treatment, nutrition, and day by day matters at the same time), the undertaking of computerizing the topic coding can be given a role as a multi-name grouping: for each sentence, there can be up to N marks, where N is the number of subjects in the schema. This kind of arrangement is more testing than single-mark grouping, where one sentence can be depicted by only one name looked over the N themes in a diagram. Generally, there are two methodologies for multi-name, multi-class classification: problem change strategies and calculation adjustment methods. In this paper, we depend on three diverse administered classifiers, an LDA classifier, a SVM, and a convolutional neural network. They speak to three sorts of standard supervised learning structures: generative graphical models describe local max-edge straight classifiers and neural systems. Within these three models, marked LDA and neural systems are capable to handle multi-name orders normally since they permit multiple yields. For the SVM, we think about N paired, single-name classifiers and total the N yields into one multi-label. For the marked LDA classifier, we depend on a self-implemented Gibbs sampler for named LDA, in light of the open-source LDA implementation. The two hyper-parameters of the model, alpha and beta, are set as 0.1 and 0.5 tentatively agreeing to a network search. For SVM, we depend on the open-source apparatus Lib SVM. We likewise did a credulous framework search (logarithm lattice look for c and gamma, explicitly) for parameters to discover the setting which yields the best execution. In the outcome area we will report the performance of SVM under the after parameters: spiral premise work kernel with c= 100 and gamma = 103 and outspread premise work kernel with c= 10 and gamma = 102, for SVM and SVM-e respectively. Others parameters were set as default since we saw that in this task execution of the SVM was just somewhat impacted by them. The convolutional neural system we utilized follows, which has covered up the convolutional layer. Initially, the grouping of words is represented as a succession of vectors of dimensionD¼100, by using a query table. The 100-dimensional vectors are connecting as in (our model is equal to the ''single-channel" modeled scribed in Section2 of this paper). The word embeddings used in this query table were pre-prepared, by utilizing the word2vec algorithm, on the whole, unannotated dataset from the equivalent forum. Then we take the convolutions of this succession of ''word vectors" with H filters, getting a score for each channel and each position in the sentence. So as to acquire a fixed-size portrayal of the sentence, we perform max-pooling over the entire sentence: for each channel, we just keep the most extreme score over all the positions in the sentence. We at long last apply a completely associated layer to acquire a score for every theme. The double calculated misfortune is utilized for each label independently (no soft max layer is utilized), since records can have different marks. One neural system is utilized for the multi-mark grouping, rather than utilizing autonomous ones for each label. The A dared algorithm, with a learning pace of 0.02, is used to get familiar with the model parameters. No standardization or zero-padding is utilized for convolutional neural systems. Since the dataset is imbalanced, we propose to utilize awry expenses for positive and negative models. The proportion between these expenses is meant by the scalar. In our experiments, His is set to 800 and is set to 0.25 as per a matrix search. The walk size of the convolutions is set to 3. We utilized our very own Java usage of convolutional neural arranges. (The usage was approved utilizing the ''MR" dataset, accomplishing comparable execution detailed in the unique paper.) However, reproducing our trials with any prevalent profound learning system, (for example, Tensor Flow or Torch) ought to be straightforward.  Prior to preparing the classifiers, the accompanying pre-handling and feature choice advances were completed:  every one of the words in the corpus were stemmed; stop words were expelled from the vocabulary; dimensionality decreases were done by doing Named Entity Recognition (utilizing Stanford NER) to recognize Person, Location, Organization names just as special tokens, for example, number, cash, time. What's more, to make the comparison crosswise over instruments progressively significant, we likewise utilize the word embedding contribution of CNN as highlights for SVM, looking at how it differs from the sack of words portrayals. At the end of the day, we replace the pack of-words contribution with the 100-dimensional word vectors as features, making the framework signified as SVM-e. For all the models we utilized the common edges for the yields of the classifiers, just as for every single after investigation exhibited in this paper. Given a predetermined number of tests, we can't part the dataset into preparing/approval/test with an adequate occasion in each set. All things considered, we completed the examinations as pursues. First, we haphazardly split the clarified dataset into 5 subsets with roughly adjusted quantities of sentences. At that point for each classifier, we tuned parameters with subset 1–4 as preparing and subset 5 as validation, where we ran a framework search and discovered ideal qualities of hyperparameters. At that point, we applied the hyperparameters identified on the other 4 preparing approval sets, for example utilizing subsets 1, 2, 3, and 4 as the validation set and the remaining subsets as the preparing set, respectively. Then we detailed normal execution on the 4 subsets.

  1. c) Material and methods

The proposed strategy is partitioned into two periods of preparing and testing. These stages are clarified in detail after we clarify the materials utilized in the procedure.

For the proposed methodology, a public image database was used. Known as the Digital Database for Screening Mammography (DDSM), it is widely used in the literature for the validation of methods. This database contains more than 2500 exams purchased from Massa chalets General Hospital, Wake Forest University, and the University of Washington at St . Louis Medical School. Each exam contains up to four images, two sides (left and right), and two projections [craniocaudal(CC)and medio lateral oblique(MLO)], as well as extra information about the exam(breast density, study date, patient age, type of pathology,num-berofanomalies, etc.) and about the image ( filename, image type, scan date, scanner type, sequence, pixels per line, bits per pixel, lesion location, etc.). All information contained in the DDSM was provided by experts. Breast density is an important factor in this work because different models will be created for different types of density. In DDSM, the breasts assume a value of 1to 4 to according to the BI-RADS, and the inter-presentation of these values Innocence breasts were considered those with values1and2, and dense image values of 3 and4. In our method, we use the same criterion. In this work, we used 1241 pairs of mammograms of DDSM, of which 502 were non-dense breasts and 739 were dense. These pairs of mammograms must contain at least one mass lesion. Ages with another type of lesion were not considered in this study.

  1. d) Training phase

In this phase, 80% of pairs of mammograms were used to create a model capable of classifying breasts into dense and non-dense and classifying segmented regions into regions of mass and non-mass. Illustrate the stages of this phase. Initially, the density model will be presented to classify the breasts into dense and non-dense. Then, the classification models the mass regions and non-masses in the dense and non-dense breast.

  1. e) Creation of the density model

At this stage, we choose to isolate the breast. For that, the pre-processing step was used. Then, regions are extracted from the breast, and a CNN is trained to classify new cases. The pre-processing step has the task of isolating the breast and discarding some peculiarities that the DDSM images have and are not necessary for the study, such as patient information, background noises, and slight deformations on the edges. Initially, the original size of the mammograms was sized. Images from the DDSM have an average height of 6,000 pixels. In this work, the images were reduced to a height of 1,024 pixels by a proportional width to decrease the computational time. Some works in the literature adopted this practice and show that the re-sizing does not cause negative impacts on the results [4, 7, 9, 10, and 18]. The proposed methodology adopts the technique of bilateral comparison of pairs of mammograms’ overfit symmetric regions. In the literature, there are many works relating this type of tech-inquest to the existence of lesions, as presented in section 2. To facilitate the registration step, avoid unnecessary deformations, and also facilitate the comparison in the segmentation step, it was chosen to mirror one of the breasts so that the two breasts of the same patient were locally close. To prepare the images for the other stages of the methodology, it is necessary to isolate the breast. The existence of undesired structures in the digitized mammograms (noise, borders, and marks) may be disrupted and may not be relevant to the purpose of this methodology. A methodology based on Subperiosteal was used. First, revise the removal of edges where 30 pixels away from the side edges are removed. Then, a background removal is done. In this step, the pixel values are divided into two groups according to the intensity. The first is formed by the pixel so fighter intensity and the second by the lower ones. Thus, the group of lower intensity has its values replaced by 0 (equivalent to black color). Finally, to remove the external objects from the breast, are going growth algorithm is misused. These do there grow this positioned in the center of the half of the image who's of values of pixels are higher (that is, in the half where the breast is). The region growth stops when it reaches values of 0. The resulting image of region growth is then used as a mask over the first image, resulting in the finish of the preprocessing step, and then the separated bosom.

Table 1. Density definitions according to BI-RADS





Breast totally filled with fat.

Absence of fibrous tissue.


Presence of dispersed fibroglandular tissue.

May hide a lesion in a mammogram.


The breast is heterogeneously dense.

May reduce the sensitivity of the exam.


The breast tissue is extremely dense.

Reduces the sensitivity of the exam.



Figure 5. Training phase of the methodology



Figure 6. (a) original images, (b) an image is mirrored, (c) both images removed edges,                  background and marks.


Breast cancer related to machine learning

Invasive ductal carcinoma mascases (40specimens) from between 2007 and 2008 were identified from the files of the Department of Pathology at Erciyes University Medical Faculty in Turkey. Histological assessments were performed on 4–5 micrometer thick HE-stained sections of formalin-fixed paraffin-embedded tumors. Diaminobenzidine tetrahydrochloride (DakoLiquidDABPlus, K3468, Denmark) was used as a chromogen and these sections were counter stained with Mayer’s haematoxylin. Therefore, nuclei that have positive ER status expression were stained in brownish colours, and nuclei with negative ER expression were stained in bluish colors. The staining of ER was evaluated in the nuclei of the malignant cells. The ER status was scored using the Allred scoring system. Each slide was analyzed in an alight microscope by the same pathologist (F.A.). She selected one representative region from each specimen under the microscope and captured the region in each slide with a linear magnification of 40 as a 2048×1536×24-bit JPEG color image by means of a Lecia DMD108micro-imaging device. Considering inter-observer variation, two experienced pathologists, H.A. and M.K., manually assessed the images of these collected specimens according to the Allred scoring protocol. Those images with substantial visual artifacts, cytoplasmic stain, or scoring disagreements of the experts (7cases) were discarded because of usage limitations of the applied computer-based assessment system. Of the remaining 33 cases,6 cases were reserved as a separate training set and these were not used in test experiments. The experts were asked to mark some nuclei on the images of the training specimens by labeling each nucleus with a designated color by use of Microsoft Paint Brush.


Figure 7. Processes of computer-based prognosis system

Stain intensity evaluation for each nucleus. Since it was very tedious and labor-intensive work to mark and get an adequate number of nuclei for a separate validation set, we preferred to employ a 10 times10-fold cross-validation scheme to compare classification performances. A separate set of 27 image files was used as the test set and the prognostic total scores of the two experts for these images were recorded for performance comparisons has presented (Hinton G E et al., 2012).


Experimental classification methods

In this study, we included four kinds of classifiers in the experiments, as they are referred to in the writing for a similar sort of restorative application. These strategies are k-closest neighbors (KNN), radial premise neural networks (RBFN), support vector machines (SVM), and k-implies clustering(KM).After leading some underlying experiments ,we likewise incorporated the credulous Bayes classifier (NB) and utilitarian trees (FT). Among these learning algorithms, the FT is generally a new one, in which the information space is looked at with direct or non-straight capacities in the leaves and hubs of the choice trees. The FT is additionally ready to deliver new properties from the training data to form a higher dimensional data space and discover new relations by (Jiao, Z et al., 2019).


Figure 8. Sub-processes of segmentation stage: (a) cropped part of an original image, (b) result of Otsu thresholding on original part, (c) result of morphological operations, and (d) detected nuclei with superimposed borders.

These features make FT different from the other decision tree models. In addition to the previously mentioned algorithms, a Meta-learning scheme combining the naïve bayed classifier and functional trees was used in the experiments. This multi-expert combination was experimentally decided to rely on the prognostic scores when both of the algorithms revealed good generalization behavior on the test images. On the other hand, as known, the more information is gained from a combination of classifiers as the more diverse classifiers are involved. In this sense, it is expected that the two methods will complement each other as their underlying decision concepts are different. In this way, the individual weaknesses of the classifiers can be compensated. There are several ways to combine the output of the classifiers. Here, we used the voting scheme in which maximum probability criteria are employed over the output of the combined classifiers. In the rest of the paper, this meta-learning scheme is referred to as VOTE, since this scheme combines classifiers relying upon a voting function. In this paper, we used three methods to evaluate classification performance. These methods are correct classification percentage, sensitivity and specificity analysis, and receiver operating characteristic (ROC) curves. The classifier models were formed by using the nuclei of the training images; these were completely different from the test images. During the training, 10 times 10-fold cross-validation was applied and hence the average results of 100 experiments were recorded as classification performance metrics. Moreover, stratification was maintained to ensure that the instances were evenly distributed for each class in the subsets during the training-approval (Rajiv Raju, 2012).


  1. Information and techniques regarding breast cancer

A Breast Cancer dataset from the University of Wisconsin at Madison was utilized. It comprises 699 genuine FNA cases (458benign, 241 malignant, 9 traits each, and right analysis). A tissue biopsy was utilized as the gold standard. The WEKA programming bundle from the University of Waikato for used for all AI (ML) tests. Execution of the ML algorithms was assessed utilizing stratified ten times cross approval. Three performance measures were utilized: right order rate, the zone under the region of convergence (ROC) bend accuracy (AUC), and the Kappa statistic. Sixty-four directed ML calculations were assessed. Next, the 9 attributes of the informational collection were positioned dependent on a data gain evaluation. The five best-performing calculations were assessed again after successively decreasing the quantity of credits structure 9 to 1.

In the data addition positioning of characteristics, Cell Size Uniformity and Cell Shape Uniformity were the most significant qualities followed in diminishing pertinence by Bare Nuclei, Bland Chromatin, Single Epi Cell Size, Normal Nucleoli, Clump Thickness and Marginal Adhesion. The least applicable characteristic was Mitoses. Decreasing traits just unassumingly diminished exhibitions. On account of Bays Net, with just 3 traits, arrangement rates were still above 95%. Eliminating Mitoses as a characteristic really improved right classification execution (Krizhevsky, A, 2012).


  1. b) Performance measures of breast cancer detection

A review audit was performed on all bosom malignant growth patients who had sentinel lymph hub biopsies analyzed intraoperative between March 2008 and April 2010. The intraoperative assessment was either performed by solidified segment or cytologic smear strategy. The last conclusion of the sentinel lymph hubs depended on hematoxylins and eosin-recolored and immunohistochemically-recolored slides. An aggregate of 522 patients, yielding 1385 sentinel lymph nodes, were incorporated into the examination. According to the AJCC, sentinel lymph nodes were viewed as positive for metastasis if the focal point of the tumor was > 0.2 mm. Eight sentinel lymph hubs were rejected from the investigation because of delay to the perpetual segment during intraoperative discussion.



Deep Neural Network Algorithms

  1. AlexNet

AlexNet was the first convolutional neural network (CNN) that exhibited performance beyond the state-of-the-art in the task of object detection and classification. As shown in Figure 9, the network contains eight layers; the first five are convolutional and the remaining three are fully-connected by (Rampun, A et al., 2018). The first layer of the network filters the input image (sized 224 × 224) with 96 kernels of size 11 × 11 with a stride of 4 pixels. The depth of these kernels equals the number of channels of the input image. The second layer takes as input the output of the first layer, after local response normalization and max-pooling have been applied, sifting it with 256 portions of size 5 × 5 × 96. Furthermore, fifth layers are associated with each other with no mediating pooling or standardization layers. The third layer has 384 parts of size 3 × 3 × 256. The fourth layer has 384 portions of size 3 × 3 × 384 and the fifth layer has 256 parts of size 3 × 3 × 384. Over the convolutional layers, two completely associated layers are associated that have 4096 neurons each. The quantity of neurons in the third completely associated layer approaches the number of classes. This kind of reaction standardization makes a challenge for huge exercises among neuron yields processed utilizing various portions. While LRN was received and joined into different other system models, it was expelled from AlexNet in a resulting production (Siara Charan et al., 2018).

An especially significant part of preparing was the utilization of dropout [15] (with the likelihood of 0.5) for the three completely associated layers (P. M. Ashok Kumar et. al., 2021). This strategy comprises of setting to zero the yield of each covered-up neuron with some likelihood. The neurons that are picked add to neither the forward pass nor the back-engendering. Subsequently, in each preparation emphasis, an alternate design is inspected. The dropout procedure goes about as a regularized, constraining the system to learn significant highlights, yet builds the preparing time by (Ting, F.F et. al., 2019).


Figure 9. AlexNet Structure

  1. VGG

(Szegedy C et al., 2015) have explored the impact of system profundity while keeping the convolution channels small. They demonstrated that huge improvement can be accomplished by driving the profundity to 16–19 layers. The contribution to the convolutional layer is a fixed-size 224 × 224 picture. The picture is passed through a heap of convolutional layers with ReLU actuations where channels with exceptionally few open fields (3 × 3) were utilized. The convolution walk is additionally fixed to 1. Spatial pooling is done by five max-pooling layers, performed after a portion of the convolutional layers. Thus to AlexNet, a heap of three completely associated layers is put over the convolutional part of the system. The upside of VGG is that, by stacking numerous convolutional layers with little measured parts, the viable open field of the system is expanded, while diminishing the number of parameters contrasted with utilizing less convolutional layers with bigger portions for the equivalent open field (Szegedy, C et. al., 2015). The creators tried various designs of fluctuating profundity (9, 11, 16, and 19 layers). In one of the arrangements 1 × 1 channel was used, which can be viewed as a direct change of the information channels. This is additionally an approach to building the non-linearity of the choice capacity without influencing the open fields of the convolutional layers. One of the arrangements likewise incorporated an LRN layer. As announced in the paper, the best outcomes were accomplished for profundities somewhere in the range of 16 and 19. The engineering of VGG-16 is delineated in (Figure 10).



Figure 10. VGG-16 Structure [21]




  1. GoogLeNet/Inception

GoogLeNet (Ting F F et al., 2019) is the principal execution utilizing the Inception module. The primary thought behind this module depends on the creators' discoveries about how a neighborhood’s scanty structure can be approximated by thick segments (Ioffe, S. et. al., 2015). Their point was to locate the ideal neighborhood structure and rehash it, building multilayer arranges. The Inception module involves four branches that get similar info (Figure 3a). The first branch channels the contribution with a 1 × 1 convolution, which goes about as a direct change on info channels (Simonyan, K et. al., 2014). The second and third branches perform 1 × 1 kernelled convolutions for dimensionality decrease pursued by convolutional layers with pieces of size 3 × 3 and 5 × 5, individually. The fourth branch performs max-pooling pursued by convolution with 1 × 1 pieces. At last, the yields of each branch are linked and sustained as a contribution to the following square. In light of the presumption that center layers of a CNN should deliver discriminative highlights, the creators included basic classifiers (two completely associated also, a soft max layer) that work on the highlights delivered by a halfway purpose of the system (Sagayam K. M. et. al., 2021). The misfortune determined by the choices of these classifiers is utilized during the back-spread stage to ascertain extra angles that add to the preparation of the separate convolutional layers. At deduction time the helper classifiers are disposed of.


Figure 11. (a) Inception module of GoogLeNet; (b) Inception-v2 module


In resulting productions, modified adaptations of the Inception module have been proposed, alongside somewhat changed system designs. The creators proposed Batch Normalization (BN) furthermore, consolidated it into the Inception arrange. BN is a strategy that makes standardization part of the model engineering, playing out the standardization for each preparation smaller than expected group (Krizhevsky, A, 2014). The creators contend that BN takes into consideration higher learning rates and more straightforward introduction strategies without encountering antagonistic impacts. As per BN, every one of the pictures of the present little group is rescaled with the goal that they have a mean estimation of 0 and fluctuation of 1. Thus, a straight change is applied, the parameters of which are learned through the preparation procedure by (Hinton, G.E et. al., 2012). The system that was utilized in (He, K et. al., 2016), in particular Inception-v2, was a slight alteration of GoogLeNet. Aside from the joining of BN, the most significant change is that the 5 × 5 convolutional layers of the Inception module were supplanted by two continuous 3 × 3 layers (Figure 11b).

  1. Residual Networks

Remaining systems (ResNets) comprise of reformulated convolutional layers that are learning leftover capacities with reference to the data sources. The creators contend that this kind of system is simpler to enhance and can be of altogether expanded profundity. The usage of a "lingering square", as portrayed in (Ting F F et al., 2019), is clear: for every few convolutional layers an "alternate way association" is added that runs parallel to these layers and executes the personality mapping. The yield of the convolutional layers is then added to the yield of the alternate way branch and the outcome is spread to the consequent square (Figure 4). Alongside the utilization of alternate route associations, the system design is mostly enlivened by the way of thinking of VGG systems. All convolutional layers have little pieces of size 3 × 3 and keep two straightforward plan rules: (I) for a similar yield highlight guide size, the layers have a similar number of channels; (ii) when the element guide size is split (with convolutional layers of walk 2), the quantity of channels is multiplied in order to protect the time multifaceted nature per layer. The creators tried models of shifting profundity in the range somewhere in the range between 34 and 152 layers.


Figure 12. Building block of ResNet

Inference of survey

This literature survey report provides the depth of knowledge, in which to perform the detection of breast cancer using advanced soft computing paradigms. The various performance metrics and methods used in the various pre-trained model such as AlexNet 56, VGG-16, VGG-19, ResNet-50, ResNet-101, ResNet-152, GoogLeNet, Inception-BN (v2) is like accuracy, ROC curve, and the number of weights has been measured. Through this work, the researchers may explore the numerous problem in this field.



The various case study reports on breast cancer detection using mammography images have been presented. In order to enhance the performance measures of computational intelligence, a convolutional neural network algorithm has been proposed. This work also shows the comparative analysis study with an existing approach to soft computing techniques.

Conflict of interest

The authors declare no potential conflict of interest regarding the publication of this work. In addition, the ethical issues including plagiarism, informed consent, misconduct, data fabrication and, or falsification, double publication and, or submission, and redundancy have been completely witnessed by the authors.


The author(s) received no financial support for the research, authorship, and/or publication of this article

  1. Becker, Anton S, Marcon, Magda MD, Ghafoor, Soleen, Wurnig, Mortiz C, Frauenfelder, Thomas, Boss, Andreas (2017). Deep learning in mammography: Diagnostic Accuracy of a multipurpose image analysis software in the detection of breast cancer.  Investigative Radiology, 52(7), 434-440.

    Geert Litjens, Clara I. Sanchez, Nadya Timofeeva, Meyke Hermsen, Iris Nagteggal, Iringo Kovacs,  Christina Hulsbergen-van de kaa, Peter Bult, Bram van Ginneken & Jeroen van der Laak (2016). Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports 6. DOI: 10.1038/srep26286, 1-11.

    Hayit Greenspan, Bram van Ginneken, Ronald M.Summers (2016). Deep learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique. IEEE transcations on Medical Imaging, 35(5), 1153-1159.

    He, K.; Zhang, X.; Ren, S.; Sun, J (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 770–778.

    Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580.

    Ioffe, S.; Szegedy, C (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 448–456.

    Jiao, Z.; Gao, X.; Wang, Y.; Li, J (2019). A deep feature based framework for breast masses classification. Neurocomputing, 197, 221–231.

    Krizhevsky, A (2014). One weird trick for parallelizing convolutional neural networks. arXiv, arXiv:1404.5997.

    Krizhevsky, A.; Sutskever, I.; Hinton, G.E (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 1097–1105.

    1. M. Ashok Kumar, Jeevan Babu Maddala & K. M. Sagayam (2021). Enhanced Facial Emotion Recognition by Optimal Descriptor Selection with Neural Network. IETE Journal of Research, DOI: 10.1090/03772063.2021.1902868, 1-20.

    Rajiv Raju (2012). Relative Importance of Fine Needle Aspiration Features for Breast Cancer Diagnosis: A Study Using Information Gain Evaluation and Machine Learning. Journal of American Society of Cytopathology, 1, s11-s13.

    Rampun, A.; Scotney, B.W.; Morrow, P.J.; Wang, H (2018). Breast Mass Classification in Mammograms using Ensemble Convolutional Neural Networks. In Proceedings of the 20th International Conference on e-Health Networking, Applications and Services (Healthcom), Ostrava, Czech Republic, 17–20, 1–6.

    Sagayam K. M., A. Diana Andrushia, Ahona Ghosh, Omer Deperlioglu, A. A. Elngar (2021). Recognition of Hand Gesture Image Using Deep Convolutional Neural Network”, International Journal of Image and Graphics,, 2021.

    Saira Charan, Muhammad Jaleed Khan, Khurram Khurshad (2018). Breast cancer detection in mammograms using convolution neural networks. IEEE International Conference on Computing, Mathematical and Engineering Technologies, DOI: 10.1109/ICOMET.2018.8346384

    Simonyan, K.; Zisserman (2014), A. Very deep convolutional networks for large-scale image recognition.  arXiv:1409.1556.

    Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 1–9.

    Ting, F.F.; Tan, Y.J.; Sim, K.S (2019). Convolutional neural network improvement for breast cancer classification. Expert Syst. Appl. 120, 103–115.