ELYAN 2020 Deep Learning
ELYAN 2020 Deep Learning
ELYAN 2020 Deep Learning
2020
Abstract
∗ Correspondingauthor
Email address: [email protected] (Eyad Elyan)
Deep Learning, YOLO, P&ID, Engineering Drawings, Symbols Recognition,
GANs
1. Introduction
Large volumes of un-digitised and paper-based documents are still very com-
mon across different domains. Amongst this legacy, engineering drawings are
known to be one of the most complex types of documents to process and anal-
5 yse. They are widely used in different industries such as construction and city
planning (i.e. Floor Plan diagrams [1]), Oil and Gas (i.e. P&IDs [2]), Me-
chanical Engineering [3], AutoCAD Drawing Exchange Format (DXF) [4] and
others. Interpreting these drawings requires highly skilled people, and in some
cases long hours of work.
10 In recent years, the digitisation of these drawings is becoming increasingly
important. This is partly due to the urgent need to improve business practices
such as inventory, assets management, risk analysis, safety checks and other
types of applications, and also due to the recent advancements in the domain of
machine vision and image understanding. Deep Learning (DL) [5], in particular,
15 had significantly improved the performance by orders of magnitude in many
domains such as Gaming and AI [6], Natural Language Processing [7], Health
[8], and others. One particular domain that has benefited hugely from DL is
machine vision [9]. Convolutional Neural Networks (CNNs) [10] have made
significant progress in recent years in many image-related tasks [11]. It has
20 been successfully applied to several fields such as hand-written recognition [12],
image classification [13, 14], Face Recognition & Biometrics [15] and others.
Before the CNNs, the improvements in image classification, segmentation, and
object detection was marginal and incremental. However, the introduction of
CNNs revolutionalised this field. For example, Deep Face [16], a face recognition
25 system that was first proposed by FaceBook in 2014 achieved an accuracy of
97.35%, beating the state-of-the-art then, by 27%.
Core image processing tasks such as shape and object detection, recognition,
2
and tracking have become much less challenging even under different conditions
and in much less controlled environments. Faster Region-based CNN (R-CNN)
30 [17], Single Shot Detectors (SSD) [18], Region-based Fully Convolutional Net-
works (R-FCN) [19] and You Only Look Once (YOLO) [20] are all relatively re-
cent methods that showed superior performance in the field of object detection,
tracking, and classification. These methods and their extensions have signifi-
cantly advanced this area of research and solved some of the most challenging
35 and inherent vision problems such as occlusions, light conditions, orientation,
and others, which were considered major challenges, even for a specific vision
task in a more controlled environment [21].
Significant advancement has also been made in the area of Generative Models
and was successfully applied in many applications. Among these, Generative
40 Adversarial Networks (GAN) proved to be one of the most established and
commonly used methods in generating content. GANs were initially introduced
by Ian Goodfellow in 2014 [22]. In the Methods section, we will discuss our
GAN-based method to handle the class imbalance problem. This is another
challenging problem that is common across many domains [23] including engi-
45 neering drawings, where one or more class of symbols in the diagrams are either
underrepresented or overrepresented in the dataset [24].
Despite this massive progress in the field of image processing and analysis,
very little progress has been made in the area of digitising complex engineering
drawings, and extracting information from these diagrams is still considered a
50 challenging problem [25]. To date, a major problem of most of the existing
solutions is that they still follow a traditional image-processing approach, which
requires extensive features extraction and engineering and carefully designed
heuristics [26]. These are often very domain-dependent, sensitive to noise and
data distribution, and mostly dedicated to solving part of the problem (i.e. de-
55 tecting symbols, separating graphics from text, and so on). As can be seen in
Figure 1, not only such an approach difficult to generalise across different sce-
narios, but also the performance of any machine learning algorithm will hugely
depend on the quality and accuracy of the extracted features.
3
Machine
Learning
Pre-process Features Algorithms
4
• Thorough evaluation using large collection of P&ID diagrams provided by
an industry partner in the Oil and Gas sector.
Testing
Set Model list of
recognised
symbols in MFC-GAN
the
diagrams
Figure 2: Schematic diagram of the framework for processing and analysing engineering doc-
uments
2. Related Work
5
[28, 29, 30, 31]. In recent years, and due to the significant progress in machine
100 vision research, computer power and also due to the availability of large vol-
umes of un-digitised data, the demand to have a fully automated framework for
digitising these drawings is becoming increasingly important.
Examples of work that aimed at extracting information from engineering
documents include analysis of musical notes [32], mechanical drawings [33], op-
105 tical character recognition (OCR) [34, 35, 36], and extracting information from
P&ID drawings [2, 37, 38]. It can be argued that most of the existing litera-
ture followed a traditional image processing approach [39], which requires some
form of feature extraction from the image[28], features representations [30], and
classification to determine the class of objects (i.e. symbols, digits, ...) [31].
110 The key limitation of traditional machine vision methods is that they re-
quire extensive features engineering, depend heavily on the quality of extracted
features, and often won’t generalise well to other unseen examples. A recent
extensive review showed that most of the existing literature focused on solving
part of the problem rather than providing a fully automated framework for digi-
115 tising an engineering diagram [26]. Examples include methods for recognising
symbols and lines in a drawing [40], detecting and separating text from symbols
and other graphics elements in diagram [38], classifying symbols in engineering
drawings [2] and so on. This is partly due to the complexity of the problem (i.e.
localising every single element in the document), and also due to the limitations
120 of the traditional image processing and analysis methods and the inherent vision
problems such as the sensitivity to noise, quality of the image, the orientation
of shapes and so on. Consider for example the work in [2], the authors used
a set of heuristic rules to localise symbols in the drawings, a Random Forest
[41] was then used to classify the symbols achieving an average accuracy higher
125 than 95%. Similar work was presented in [38], where a set of heuristics were
also used to detect and separate text from graphics elements. However, such an
approach is very dependent on the data distribution, and a slight variation in
the diagrams or in symbols representation might require adapting the existing
heuristics rules or creating new ones.
6
130 In a closely related area, Rebelo et al [42] presented a study on optical mu-
sic recognition and classification methods for musical symbols. They suggested
that adjoining staff lines, presence of symbols in close proximity to music notes,
broken symbols, overlapping symbols and areas with high symbol density all
contributed to the complexity of optical music recognition. Four classification
135 methods namely a multi-layer perceptron neural network model, Hidden Markov
Model, K-nearest neighbour and Support Vector Machine (SVM) were evalu-
ated on datasets of both synthetic and handwritten music scores. The highest
performance was obtained with an SVM model, however all approaches imple-
mented detection then removal of music staff lines and segmented the symbols
140 prior to symbol classification.
Khan et al [43], used video image analysis as part of a flight deck warning
system, which combined automated dial reading of flight instruments with do-
main knowledge. Experiments on a flight simulator and real flight aimed to
obtain the position of a white needle on the flight instrument using three image
145 processing approaches: background subtraction, pattern matching and a convo-
lution based approach. Results showed that the convolution method obtained
the highest accuracy, highest true positive rate and highest true negative rate.
In recent years, DL-based methods were explored and successfully applied
to some tasks that are similar to engineering drawings analysis. Ziran et al.
150 proposed a method, based on Single Shot Detectors (SSD) [18], to detect and
recognise furniture objects, doors, and windows in floor plan diagrams [44].
The results were encouraging. However, the datasets used were simple with a
limited number of furniture objects in each drawing (12). The performance also
dropped under the imbalanced class distribution of objects in the images.
155 Faster R-CNN was used in [45] for the detection and recognition of handwrit-
ten characters. Although the work focused mainly on specific elements of the
documents (mathematical expressions and flowcharts), promising results over
other traditional methods were achieved.
Detection and recognition of musical notes in documents have also benefited
160 from adopting Deep Learning-based methods [44, 46]. R-CNN, R-FCN, and
7
SDD were applied successfully to detect and recognise handwritten music notes
[46]. Results showed an improvement in symbols detection over other traditional
structured machine vision methods.
A framework for extracting information from P&ID drawings was presented
165 very recently in [47]. The authors used a two-step approach. First, Deep Learn-
ing methods were used to localise symbols and text, and then heuristic-based
methods were employed to detect other elements of the drawing (i.e. Euclidean
metrics for associating tags and symbols with pipelines, probabilistic Hough
transform to detect pipelines, etc.) The methods for localising symbols were
170 based on a fully connected convolutional neural network. A dataset of four
sheets consisting of 672 flow diagrams was used. Results were an improvement
over other traditional methods. However, accuracy wasn’t consistent across all
components. Class accuracy ranged from 100% for some components to 64.0%
for others (i.e. symbols of a certain class). Moreover, only a limited number of
175 symbols were used in this study (10 different classes of symbols) and the P&ID
sheets seem to be of a very good quality which is not often the case in the real
world.
To summarise, existing literature shows a clear gap between the current state
of machine vision and image understanding -due to the rapid development in this
180 field - and the slow and incremental progress in a very important application
domain across many industries.
8
ples that capture the underlying characteristics of the original data (replicating
original content). Figure 3 depicts the GAN model.
min max V (D, G) = Ex∼pdata (x) [logD(x)] + Ez∼pz (z) [log(1 − D(G(z)))] (1)
D G
195 Where pdata (x) is the probability distribution over the real data, x is a sam-
ple from the real training data, pz is the probability distribution over the noise
vector z, and G(z) is the output from the generator function G (or generated
images). GANs are state-of-the-art in terms of the quality of the image gener-
ated.
200 GANs have been successfully applied to different problems including image
generation [48, 49], segmentation and speech synthesis. In recent years they
were also successfully applied to handle class-imbalance problems [50, 51]. The
class imbalance is common across different domains including health, security,
and banking [23]. The problem happens when one or more class is either un-
205 derrepresented or overrepresented in the dataset. In such scenarios, a typical
supervised learning algorithm tends to be biased towards the majority class
when dealing with imbalanced datasets [24].
Supervised GANs provide an extension to the original GAN framework by
introducing conditional probabilities in the value function. This allows more
210 control over the generated samples and introduces diversity which is needed
for augmenting synthetic input data for class-imbalanced datasets. Typical
9
examples include vanilla GAN [22], CGAN [52] and AC-GAN [53]. Although
the literature shows that these models can be hugely affected by class-imbalance
especially in extreme cases [54].
215 Recent work appeared in [51] introduced a new extension of the GAN mod-
els. The authors trained the GAN models at a fine-grained level by updating the
discriminator objective to not only distinguish between fake and real instances
but also to classify the fake instances into different classes (i.e. Fake 1, Fake
2, etc). Extensive experiments using four different datasets showed superior re-
220 sults over other GAN models. Generated samples proved to be of good quality
and were successfully used to augment the dataset and improve the detection
rate of minority class instances.
3. Methods
10
240 image. The prediction from a grid is presented as S × S × (B ∗ 5 + C). Where
S is the size of the grid, B is the bounding box and C is the class probabilities
(i.e. probability of the symbol being gate valve, sensor, etc...). Figure 4 depicts
this setting.
Figure 4: The method divides the P&ID Diagram into a grid, following the YOLO model
[55], and predicts the class probabilities of the bounding boxes. The figure shows the symbols
sensor, flange, DBBPV, DB&BBV and RS
The YOLO model was chosen for two main reasons. First, it is a simple
245 framework, which allows simultaneous predictions of multiple bounding boxes
and class probabilities using a single convolutional neural network. Second, com-
pared to other models, YOLO is considered extremely fast. For testing P&IDs
that may contain on average 180 instances of various engineering symbols, this
is very important in a practical context.
For experiments in this paper, we chose to work with Piping and Instru-
mentation Diagrams (P&IDs) Figure 5. A collection of 172 P&ID sheets were
obtained from an Oil and Gas industrial partner for evaluation purposes. These
diagrams contain different types of symbols, lines, and text (Figure 5).
11
Figure 5: Part of a P&ID Diagram
255 Additionally, the P&IDs are of different qualities, which makes the dataset
suitable for evaluation purposes. The P&ID diagrams can be defined as schematic
diagrams representing the different components of the process and the connectiv-
ity information. It is a representation of equipment (often depicted as symbols)
and process flow (depicted as different types of lines) [2].
260 Such diagrams are available across many industries in the form of paper or
scanned documents. Interpreting and analysing these documents requires expert
knowledge, and is often time-consuming [26, 56]. Moreover, a misinterpretation
of such documents can be very costly. For example, if a pipe needs to be replaced
in an Oil and Gas installation, then an engineer needs to check the corresponding
265 P&ID diagram, identify the valves that must be closed before carrying on the
task to ensure safety. In other words, accurate interpretation of these drawings
is paramount.
The original P&IDs sheets are large images, 7500 × 5250 pixels. To speed
270 up the training process we divided the sheet into 6 × 4 grid, resulting in 24 sub-
12
images (patches) with relatively much smaller sizes compared to the original
sheets (1250 × 1300).
Training a Deep Learning model requires fully annotated images/ diagrams.
1
To do so, we have used the Sloth tool to annotate the collection of P&ID
275 diagrams. In total 29 different symbols were annotated in the whole dataset
(Figure 7). The annotation process is simple and involves customising the sloth
tool to record the corresponding symbols names (class) and its location in the
diagram.
The resulting annotation of data is captured in a file representing the 29 sym-
280 bols. Data recorded included the x,y coordinates of the center of the bounding
boxes, width and height of the bounding box enclosing symbols. In total, 13,327
symbols belonging to the 29 different classes were annotated. The dataset is
hugely imbalanced as can be seen in Figure 6.
Only 25 symbols of these were used in the experiments. These are shown
285 in Figure 7. Five symbols that were extremely under-represented in the whole
1 https://sloth.readthedocs.io/en/latest/
13
dataset (i.e. only one or two instances of these symbols appear in the training
and testing sets) were excluded from the first experiment.
3.4. MFC-GANs
14
305 sigmoid activated.
Similar to most GAN models the generator’s input is a noise vector of size
100 and combined with symbol label encoding (see [51] for details). This label
encoding is used to control the class-specific generation, which is essential for
our experiment.
310 The generator output is a 64 × 64 greyscale symbols image. For our ex-
periments (following sections) we used a batch size of 100 and a learning rate
of 0.001 which was experimentally chosen. Spectral normalisation was used in
both the generator and the discriminator. The proposed model will be trained
using Equations 2 and 3, 4.
315 Where Ls is used to estimate the sampling loss, which represents the prob-
ability of the sample being real or fake. Lcd and Lcg are used to estimate the
classification losses over the generator and the discriminator. Xreal represents
the training data and Xf ake is the set of generated images.
320 Two experiments were carried out. The first experiment was designed to
evaluate an end-to-end solution for recognising symbols in engineering drawings.
We are assuming here, that locating and recognising these symbols will simplify
subsequent tasks in a framework for analysing the whole drawings (i.e. detecting
text, pipelines, etc...). This is simply because the majority of these types of
325 drawings are made of symbols. The second experiment is separate and is focused
on handling the class-imbalance problem using GAN-based methods.
15
4.1. Symbols Recognition
In our dataset, the P&ID sheets were approximately 7500 x 5250 pixels in
size. To use such image size in training data is computationally expensive and
330 therefore each P&ID was split into 24 patches by dividing the original P&ID
width by 6 and the height by 4. This gave a patch size of approximately 1250
x 1300 pixels. The annotation data for each patch was obtained using the
annotations for the whole P&ID as discussed in the previous section.
For the training phase, we excluded symbols that overlapped multiple patches.
335 After extensive experiments, it was decided to use the 3rd version of the YOLO
framework which proved to be improving the detection rate of small objects
compared to the first and second YOLO models [55], [27]. It is worth pointing
out that the sizes of the various engineering symbols in our dataset are relatively
small compared to the image size.
340 The YOLO architecture was customised for the purpose of this experiment.
First the number of classes in each of the three YOLO layers was set to 25, and
the number of filters was changed accordingly and was set to 3 × (Classno + 5),
where the Classno denotes the number of classes in the dataset.
The dataset was split approximately 90%:10% into training (155 P&IDs) and
345 test (16 P&IDs) sets. A pre-trained Network was used and retrained using our
dataset and all layers were fine-tuned. Darknet implementation of the YOLO
was used in this experiment 2 . During the training process the network input
size of 416 x 416 was adjusted after every ten batches; adjusting the input size
during training was reported to improve object detection across different object
350 scales [27]. The network was trained with a learning rate of 0.001 and training
is stopped when the model was trained on 10,000 batches, (batch size of 64).
At testing time, the model input was adjusted from 416×416 to 2400×2400.
In this way, we were able to test on the original P&ID images and simplify sym-
bol detection across a whole P&ID diagram in one step as opposed to combin-
355 ing detections from the P&ID patches. For evaluation, symbols were compared
16
against the ground truth and the Intersection over Union IOU was set experi-
mentally to 0.5. A simple front end was developed using Python Libraries and
OpenCV3 for visualisation and manual error analysis purposes.
4.1.1. Results
360 The training accuracy achieved was ∼ 96%. On the testing set, 1352 sym-
bols out of 1424 were correctly located and recognised with a testing accuracy
equal to ∼ 94.9%. A heatmap of the confusion matrix for the testing set is pre-
sented in Figure 8. It can be seen and as expected that majority class instances
were accurately detected and recognised. In other words, symbols with enough
365 examples in the training set were accurately recognised.
Figure 8: Heatmap of the Confusion matrix of the 25 symbols predictions (Testing Set)
3 https://opencv.org/
17
A typical output from the proposed methods where different symbols are high-
lighted in different colours is shown Figure 9 . Recognised symbols here were
numbered and the predicted labels were recorded for further comparison against
the ground truth. These symbols include inlets/outlets that are denoted by la-
370 bel to, label from, sensors, ball valve, reducers, gate valves, globe valves, and
others.
18
Table 1 provides more details about the number of instances of each symbol
in the training and testing set named as No of Training Symbols and No of
Testing Symbols respectively. Furthermore, it also shows the number of correctly
375 recognised symbols in the testing set (Correctly Recognised) and the testing
accuracy per class (Class Accuracy).
Table 1: Results of the proposed methods for symbols recognition (Testing Set)
Symbol No of Training Symbols No of Testing Symbols Correctly Recognised (Testing) Class Accuracy
Sensor 2810 302 297 98%
Ball Valve 1629 213 212 99%
Label From 1347 103 103 100%
Label To 1178 113 113 100%
Flange 1110 158 121 77%
reducer 821 91 90 99%
DB&BBV 542 67 66 98%
Gate Valve 535 110 104 94%
check valve 396 42 42 100%
TOB/Butterfly Valve 178 59 58 98%
Plug Valve 173 8 8 100%
Globe Valve 161 7 7 100%
Needle Valve 160 10 10 100 %
RS 143 26 24 92%
PSV 118 25 22 88%
eccentric reducer 98 23 22 96%
POB valve 84 16 16 100%
DBBPV 83 15 15 100%
PRV 32 8 8 100%
control valve globe 30 6 6 100%
control valve 22 5 5 100%
vent to atm 19 8 2 25%
injection/sample point 13 2 1 50%
Angle Valve 11 2 0 0.0%
BPRV 11 5 0 0.0%
Results show that the majority of instances were accurately detected and
recognised (1352 of 1424). Figure 10 shows different symbols from various P&ID
diagrams. Notice here that symbols are accurately detected and recognised
380 regardless of its orientation. For example, reducers, gate valves, check valves,
and others appear in different orientations (Figure 10). Similarly, sensors are
accurately detected and recognised regardless of the text overlap with these
instances. This clearly shows that unlike traditional methods the proposed
19
method is robust to these inherent vision challenges (at least in this context).
385 As can be seen in Table 2, in total 72 instances of the P&ID symbols were
either unrecognised at all (missed), or incorrectly classified as different symbols.
Of these, 8 instances of symbols were incorrectly classified (Table 2). Addi-
tionally, 64 symbols were completely missed. This can be largely attributed
to the nature of the drawings, wherein these cases symbols will have text and
390 annotation almost covering its entirety. This is evident if we look at the IOU
in Table 2 which is zero across all these missed symbols.
20
Table 2: Unrecognised and misclassified symbols in Engineering Drawings
Further visual analysis of the results presented in Table 2 showed that some
symbols were incorrectly labeled. In particular, the instance of the symbol
Ball Valve, although the model predicted the ’wrong’ class symbol, visualising
395 the results showed that the model actually predicted the right class for these
symbols despite the wrong label. This is illustrated in Figure 11. Here, we use
a simple front end to visualise the recognised symbols alongside an item number
that we assign for each of them. This has greatly facilitated the analysis and
visualisation of the results.
21
Figure 11: Incorrectly labeled symbol
400 Similarly, consider the symbol of class BPRV which was classified as PRV
in all five instances in the testing set. First, it is worth noting that the number
of training instances of this symbol is extremely low (11). Additionally, the
symbol is very much similar to the PRV class. However, it is anticipated that
more training examples of this symbol will certainly improve its detection rate,
405 as it is the case with most majority class symbols (i.e. Sensor, Ball Valve,
Reducer, Gate Valve, Check Valve, Globe Valve and so on). Figure 12 shows
samples of the BPRV symbols which were incorrectly classified alongside the
actual PRV symbol.
Figure 12: Incorrectly classified BPRV symbols (first four instances from left) as PRV symbol
(fifth instance)
In summary, it can also be seen from the results presented in Table 1 that
22
410 the recognition rate of the symbols (vent to atm, Angle Valve, and BPRV ) was
quite low. This is mainly due to the limited numbers of training samples that
represent these symbols. But overall, and excluding these three symbols, the
average class accuracy of the remaining 22 symbols in the dataset is over 92%
which is very encouraging results for such a challenging problem.
23
440 During training, the minority classes were resampled to encourage learning of
minority instances structure.
The trained MFC-GAN model was then used to generate symbols of minority
class instances (the least represented in the dataset, nine symbols). The original
dataset was split 70% for training and 30% for testing set. Synthetic datasets
445 were then added to the training set. For each minority class, 5000 more synthetic
samples were added. This enabled us to rebalance the dataset by increasing the
presence of the least represented symbols.
In order to evaluate the quality of the generated symbols, we build a classi-
fication model to compare performance before and after adding the generated
450 symbols to the training set.
The classification model chosen is a CNN with 4 layers. The first three
layers are convolution layers with 32, 64, 128 outputs. These layers have a
kernel size of 3 × 3, 2 × 2 and max-pooling in-between them. The fourth layer is
a fully connected layer with 256 units that feeds in to a 29-way SoftMax output
455 representing the 29 symbol classes. The CNN was trained using SGD with a
batch size of 64 and a learning rate of 0.001. Classification results were recorded
using common metrics, namely true positive rate, balanced accuracy, G-mean
and F1-Score.
4.2.1. Results
460 Figure 13 compares the generated samples from MFC-GAN model with the
original symbols from the diagram. We also report the symbols classification
results in Table 3.
MFC-GAN generated far superior and more realistic samples. Visual in-
spection revealed distinct symbols features and the required categories were
465 generated in each instance. Moreover, MFC-GAN high-quality samples had a
positive effect on the performance of the classifier. For example, the G-Mean
and sensitivity improved from 0 to 100% on angle choke valve as can be seen
in Table 3 with just two instances of the classes. This result is consistent in
seven of the nine minority classes. However, we observed that the model did
24
P&ID MFC-GAN sample
Figure 13: Comparing original P&ID samples with MFC-GAN generated samples.
470 not improve the baseline in the other two classes control valve and PRV classes.
A closer look at Figure 13 revealed a high similarity between symbols. There is
extreme similarity between angle valve (fifth symbol from the top) with control
valve globe (eighth symbols from the top) and PRV (seventh symbol from the
top) and BPRV (second symbol from the top). Although symbols were dis-
475 tinctly generated, the similarity of symbols dwindled the classification results in
these classes. The low precision in BPRV and control valve globe classes from
Table 3 further solidifies this observation.
Metric Model angle choke valve Angle Valve BPRV control valve control valve globe injectionsample point PRV PS Gate Valve TCGvalve
Baseline 0.00 0.50 0.60 0.88 1.00 0.80 1.00 1.00 0.89
Sensitivity
MFC-GAN 1.00 1.00 0.80 0.88 1.00 0.88 0.77 1.00 0.91
Baseline 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Specificity
MFC-GAN 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Baseline 0.00 1.00 1.00 1.00 0.85 1.00 0.72 1.00 1.00
Precision
MFC-GAN 1.00 1.00 0.67 1.00 0.92 1.00 0.91 0.83 1.00
Baseline 0.00 0.67 0.75 0.93 0.92 0.89 0.84 1.00 0.94
F1-score
MFC-GAN 1.00 1.00 0.73 0.93 0.96 0.93 0.83 0.91 0.95
Baseline 0.50 0.75 0.80 0.94 1.00 0.90 1.00 1.00 0.95
Accuracy
MFC-GAN 1.00 1.00 0.90 0.94 1.00 0.94 0.89 1.00 0.96
Baseline 0.00 0.71 0.77 0.94 1.00 0.89 1.00 1.00 0.94
G-Mean
MFC-GAN 1.00 1.00 0.89 0.94 1.00 0.93 0.88 1.00 0.95
25
480 quality of these samples was evaluated subjectively by inspecting the resulting
samples, and objectively by measuring a classifier performance before and after
adding the generated samples to the training sets. Results show clearly that
performance improved across several common evaluation metrics. However, it
has to be said that MFC-GAN is only one method that can be used to han-
485 dle the class imbalance problem. Other possible methods can also be explored
and utilized. Class-imbalance is a very well researched problem, and there is a
wide range of methods that ranges from simple data augmentation, sampling to
more advanced methods such as GAN [24]. For an extensive review of different
possible methods, the reader is referred to [57].
26
methods to allow full processing and analysis of engineering diagrams such as
510 P&ID. We hypothesize that the work presented in this paper will greatly simplify
subsequent tasks such as text localisation and line detection.
Acknowledgement
This work was supported by Scotland Data Lab Innovation Centre, Oil and
Gas Innovation Centre and DNV GL.
515 References
27
USA, 2018, pp. 67–71. doi:10.1145/3206157.3206174.
535 URL http://doi.acm.org/10.1145/3206157.3206174
[9] R. Girshick, Fast r-cnn, in: Proceedings of the 2015 IEEE International
Conference on Computer Vision (ICCV), ICCV ’15, IEEE Computer So-
550 ciety, Washington, DC, USA, 2015, pp. 1440–1448. doi:10.1109/ICCV.
2015.169.
URL http://dx.doi.org/10.1109/ICCV.2015.169
28
[12] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning ap-
plied to document recognition, Proceedings of the IEEE 86 (11) (1998)
565 2278–2324. doi:10.1109/5.726791.
[15] U. Park, A. K. Jain, Face matching and retrieval using soft biometrics,
IEEE Transactions on Information Forensics and Security 5 (3) (2010) 406–
575 415. doi:10.1109/TIFS.2010.2049842.
580 [17] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object
detection with region proposal networks, in: Proceedings of the 28th Inter-
national Conference on Neural Information Processing Systems - Volume
1, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, pp. 91–99.
URL http://dl.acm.org/citation.cfm?id=2969239.2969250
[19] J. Dai, Y. Li, K. He, J. Sun, R-FCN: object detection via region-based fully
29
590 convolutional networks, CoRR abs/1605.06409. arXiv:1605.06409.
URL http://arxiv.org/abs/1605.06409
[20] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Uni-
fied, real-time object detection, in: Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 779–788.
30
[26] C. F. Moreno-Garcı́a, E. Elyan, C. Jayne, New trends on digitisation of
complex engineering drawings, Neural Computing and Applicationsdoi:
10.1007/s00521-018-3583-1.
620 URL https://doi.org/10.1007/s00521-018-3583-1
[27] J. Redmon, A. Farhadi, Yolo9000: Better, faster, stronger, in: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2017,
pp. 6517–6525. doi:10.1109/CVPR.2017.690.
630 [30] D. Zhang, G. Lu, Review of shape representation and description tech-
niques, Pattern Recognition 37 (1) (2004) 1–19. doi:10.1016/j.patcog.
2003.07.008.
31
[34] C. R. Kulkarni, A. B. Barbadekar, Text Detection and Recognition: A Re-
view, International Research Journal of Engineering and Technology (IR-
645 JET) 4 (6) (2017) 179–185.
32
[41] E. Elyan and M. M. Gaber, A genetic algorithm approach to optimising
random forests applied to class engineered data, Information Sciences 384
(2017) 220 – 234. doi:https://doi.org/10.1016/j.ins.2016.08.007.
URL http://www.sciencedirect.com/science/article/pii/
675 S0020025516305783
680 [43] W. Khan, D. Ansell, K. Kuru, M. Bilal, Flight guardian: Autonomous flight
safety improvement by monitoring aircraft cockpit instruments, Journal
of Aerospace Information Systems 15 (4) (2018) 203–214. arXiv:https:
//doi.org/10.2514/1.I010570, doi:10.2514/1.I010570.
URL https://doi.org/10.2514/1.I010570
685 [44] A. Pacha, J. Haji, J. Calvo-Zaragoza, A baseline for general music ob-
ject detection with deep learning, Applied Sciences 8 (9). doi:10.3390/
app8091488.
URL http://www.mdpi.com/2076-3417/8/9/1488
33
1901.11383.
700 URL http://arxiv.org/abs/1901.11383
[49] A. Ali-Gombe, E. Elyan, C. Jayne, Multiple fake classes gan for data aug-
705 mentation in face image dataset, in: 2019 International Joint Conference
on Neural Networks (IJCNN), 2019, pp. 1–8. doi:10.1109/IJCNN.2019.
8851953.
715 [53] A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary
classifier gans, International conference on machine learning,page 2642-2651
70 (AUG 2017) 2642–2651.
720 [55] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once:
Unified, real-time object detection, in: 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi:
10.1109/CVPR.2016.91.
34
2015 IEEE 20th Conference on Emerging Technologies Factory Automation
(ETFA), 2015, pp. 1–8. doi:10.1109/ETFA.2015.7301510.
35