Abstract: |
My PhD research work consists in looking for a new automatic method to find image-based marker from mammograms to diagnose breast cancer early. In fact, breast cancer is the most frequently diagnosed cancer among women worldwide and it is the second leading cause of death. It has been evaluated that one woman in eight is going to develop a breast cancer in her life. It is also widely accepted that early diagnosis is one of the most powerful instrument we have in fighting this type of cancer. For these reasons mammographic screening programs are performed on asymptomatic women at risk every two years in a range between 45 and 74 years. Full Field Digital Mammography (FFDM) is a non-invasive high sensitive method for early stage breast cancer detection and diagnosis, and represents the reference imaging technique to explore the breast in a complete way. Since mammography is a 2D X-ray projection imaging technique, it suffers from some intrinsic problems: a) breast structures overlapping, b) malignant masses absorbing X-rays similarly to the benignant ones and c) sensitivity being lower for masses or microcalcifications clusters detected in denser breasts. Breast density is defined as the amount of fibroglandular parenchyma or dense tissue with respect to fat tissue as seen on a mammographic exam. Furthermore, to have a sufficient sensitivity in dense breast, a higher radiation dose has to be delivered to the patient. Moreover, breast density is an intrinsic risk factor in developing cancer. The most used density standard has been established by the American College of Radiology (ACR) in 2013 and it is reported on the Breast Imaging Reporting and Data System (BI-RADS) Atlas. This standard defines four qualitative classes: almost entirely fatty (``A''), scattered areas of fibroglandular density (``B''), heterogeneously dense (``C'') and extremely dense (``D''). Since mammographic density assessment made by radiologists suffers from a not negligible intra and inter-observer variability, automatic methods have been developed in order to make the classification reproducible. The first problem in training machine learning models is due to the lack of huge public mammograms dataset and this makes the comparison among different methods difficult. Furthermore, many previous approches use a two-steps classification, which implies that classification is not completely automatic: first, they extract features from the images or they apply a segmentation method and, afterwards, they train a classifier with a Support Vector Machine or other machine learning methods. In a my previous work, a deep learning technique has been explored in order to build a breast density classifier based on residual convolutional neural network (CNN), a class of neural network that is usually used for image analysis. Thanks to the screening programs, huge amounts of mammograms can be collected and used for the development of analysis software. In the last few years, deep learning-based methods have been developed with success in a wide range of medical image analysis problems. Since deep learning methods needs a huge amount of data, the ``Azienda Ospedaliero-Universitaria Pisana'' (AOUP) collected about 2000 mammographic exams (each consisting of 4 images) from the Senology Department. The exams have been selected by a mammography specialized physician and a radiology technician. This dataset has been anonymized and extracted from the AOUP database. We are also collecting a new longitudinal dataset of screening mammograms from the "Azienda ASL Toscana Nord-Ovest" (ATNO) which is made of both cancer and control cases along with histopatological reports and a questionnaire with the known breast cancer risk factors. The latter dataset will include all the screening mammographic exams related to a woman before the diagnosis as well as all the mammographic exams of each healthy woman.
The main idea of my PhD research is to look for a signal that can distinguish women who are going to have the disease and women who will not contract the disease. In order to reach this goal, I will explore the trend of the CNN-extracted features and other classes of features, that are related to breast density, over the women life time. A breast density classifier, based on convolutional neural networks, has been trained and evaluated and I extracted the features it computed to perform the classification. At the same time, I trained another classifier, a Support Vector Machine, with the first order statistical features. The results obtained with the last classifier are promising. I am going to build a classifier which takes as input both CNN-extracted features and the statistical one in order to refine the performance. All the possible algorithms and protocols that can be useful to understand the behavior of the classifiers, such as the Class Activation Maps analysis, will be studied in order to validate and control the performance not only in terms of accuracy. Afterwards, all the features will be computed on the longitudinal dataset and they will be studied in order to find a significative trend that can distinguish women who are going to have the disease and women who are not. This could allow to have a very early diagnosis and to ensure the best prognosis possible for women. |