This project was undertaken over 6 months under the direct supervision of an assigned supervisor from Technological University Dublin. This project focused on the idea of image classification where a Convolutional Neural Network is used to classify if an un-seen brain scan contained the presence of a brain tumour. Accompanying this project was a 12,000-word document detailing the process undertaken. As a strict time limit was enforced, the project was split into 5 stages these stages were as follows:
-
Data Collection: As this project was based on a subject area which already contained a vast amount of research the project has restricted the use of an external dataset. Meaning any dataset already constructed for image classification could not be used. This restriction limited the scope of datasets that could be used, furthermore, DICOM files and other similar 3-dimensional medical imaging formats could not be used. In place, a dataset containing 250 Healthy Brain Scans and 250 Unhealthy brain scans was constructed from the continuous mining of various online medical journals. The dataset contained images of various format i.e PNG, JPG etc.
-
Pre-processing: This stage focused on the preparation of the dataset, where the collected images were fed into a pre-processing pipeline which would prepare them for model selection & model performance. The pipeline contained many pre-processing steps such as image scaling, finding the mean image and adding a blur to each brain scan. This section was accompanied by a subsection on image segmentation where I dedicated a small amount of time to investigate several ways to segment the tumour from the brain. Various methods were experimented with, however, one major issue was that in MRI brain scans both the tumour and the rim of the skull appear as a bright intensity meaning simple segmentation solutions such as thresholding cannot be used without performing 'skull stripping'. This process was performed before work had to be halted on this subsection as time was an issue.
-
Model Selection & Model Performance: This section was where the majority of the time dedicated to this project was assigned. It contained several subsections each of which had a similar goal where a Convolutional Neural Network was created and at each stage was trained and tested using a dataset split of 80:20. Each subsection or stage was used to exhaust the network of parameter options and understand the outcome of each result. After this investigation, Grid Search was used to further exhaust all possible parameter choices. Grid Search was performed using AWS where a t2 microserver was used to run the process in the background.
-
Generalization: Once the best performing model was discovered from Grid Search time was now dedicated to investigating possible solutions to the issue of overfitting. As this dataset was incredibly small for the use of Convolutional Neural Networks the results obtained at each stage were incredibly overfitted. Thus, this stage involved adding several additions to the network to try and prevent this, these additions were dropout, bias and noise. Once these additions had been added to the network I decided to use Data Augmentation to artificially increase the size of the dataset. Once the images were loaded through an augmentor the new dataset contained 1000+ images. Finally, these further tests were conducted on the augmented dataset. Once these tests were completed the final model was ready for the prediction stage.
-
Prediction: This stage first involved mining medical journals for 10 new un-seen images. With that, the final model was loaded in and was used to perform a prediction on each image. Each prediction would display the image being predicted and a result if the image contained the presence of a brain tumour. The results indicated that this network was a good predictor as it was able to successfully classify 8/10 un-seen images.