1 Introduction
Transparent plastic bottles, one of the products actively consumed in the food and beverage industry, are mainly produced by ISBM (Injection Stretch Blow Molding) process, which can cause various defects due to inadequate heat treatment of preform, inflow of dust into resin, and pressure control errors [1,2]. The method of inspecting the defect of the bottle in the production process can be summarized as follows. At the final stage of production, workers manually inspect the surface for defects, and the tolerance of fine dimensions generated during the molding process is detected using laser sensors, etc. [3,4].
However, as the efficiency and reliability of vision-based inspection equipment and algorithms have improved recently, research on vision algorithms for diagnosing defects in plastic bottles has been actively conducted. These vision-based defect diagnosis methods can be largely classified into two types: image processing techniques and deep learning algorithms.
First, image processing techniques-based defect diagnosis algorithms have the advantage of enabling precise diagnosis of single-shaped products compared to deep learning-based defect diagnosis algorithms. Saad [5] et al. proposed a machine vision system and algorithm that can remove the background and measure the shape defects of the product by repeatedly applying error and dilation treatment for images of plastic bottles filled with beverages. Karmi [6] et al. developed a system and image processing technique for extracting each region of interest where cap, dent, and label are located from images of a single plastic bottle photographed from three angles and diagnosing defects.
On the other hand, deep learning-based defect diagnosis algorithms can respond to external light sources and various products and can also have compliant diagnostic performance. Horputra [7] et al. developed an algorithm to crop the area of the bottle lid in the image through YOLOv3 and diagnose the defect of the bottle lid through Inception-Resnet-v2. Komoto [8] et al. diagnosed defects artificially simulated in normal beverage bottle products using DAE-GAN algorithm that combines denoising autoencoder and generative adversarial networks. To achieve sufficient performance for deep learning-based defect diagnosis algorithms, a significant amount of computing resources for diagnostic processors and a large amount of image data for training and testing are required. However, as the efficiency and reliability of vision-based inspection equipment and algorithms have improved recently, research on vision algorithms for diagnosing defects in plastic bottles has been actively conducted. These vision-based defect diagnosis methods can be largely classified into two types: image processing techniques and deep learning algorithms.
First, image processing techniques-based defect diagnosis algorithms have the advantage of enabling precise diagnosis of single-shaped products compared to deep learning-based defect diagnosis algorithms. Saad [5] et al. proposed a machine vision system and algorithm that can remove the background and measure the shape defects of the product by repeatedly applying error and dilation treatment for images of plastic bottles filled with beverages. Karmi [6] et al. developed a system and image processing technique for extracting each region of interest where cap, dent, and label are located from images of a single plastic bottle photographed from three angles and diagnosing defects.
On the other hand, deep learning-based defect diagnosis algorithms can respond to external light sources and various products and can also have compliant diagnostic performance. Horputra [7] et al. developed an algorithm to crop the area of the bottle lid in the image through YOLOv3 and diagnose the defect of the bottle lid through Inception-Resnet-v2. Komoto [8] et al. diagnosed defects artificially simulated in normal beverage bottle products using DAE-GAN algorithm that combines denoising autoencoder and generative adversarial networks. To achieve sufficient performance for deep learning-based defect diagnosis algorithms, a significant amount of computing resources for diagnostic processors and a large amount of image data for training and testing are required.
In this study, we developed a multi-defect diagnosis algorithm for plastic bottles based on CNN (Convolutional Neural Networks) algorithm and simple image processing techniques. Chapter 2 provides a description of the construction of a testbed for collecting image data of plastic bottles and simulated defects. Chapter 3 presents the image processing steps that can highlight the defective areas in the images and facilitate the training of CNN algorithm. Chapter 4 describes the design of the CNN-based defect diagnosis model, the image data training process, and the performance evaluation results. The effectiveness of the defect diagnosis algorithm is verified by comparing the accuracy of the model before and after applying the proposed image processing steps.
2 Defective Plastic Bottle Image Dataset
Plastic bottles can have various types of defects that can occur during the manufacturing process. Among these different types of defects, this study focused on three types of defects for diagnosis based on Baldowska’s research [9] on the types and frequencies of defects that occur in actual production processes. The selected defects are black spot, pearlescence, and shape abnormality. Pearlescence and shape abnormality defect were chosen considering their frequency of occurrence during the overall production process, while black spot defect was chosen due to their frequent occurrence during temperature control errors in the production process.
Since it is difficult to obtain bottles with defects in the commercial market, bottles with simulated defects were created as shown in Fig. 1, based on actual photographs of each defect. For simulating shape abnormality, the product surface was artificially deformed by applying heat for a short period of time. For simulating pearlescence, a candle dripping was dropped at specific locations and scraped off to create opaque areas. For simulating black spot, dots with a diameter of 2 mm or less were drawn at random locations to simulate the defects.
A simple testbed was created to capture images of bottles with each type of defect, and the quality of the captured images was qualitatively verified. Based on the investigation of methods for adjusting light sources to capture transparent objects [10], we created three types of test beds for capturing images of transparent plastic bottles. Among them, the testbed in the form of a dark room could completely remove the influence of the external light conditions, but it was difficult to identify the black spot due to the black background, and light reflection occurred on the surface of the bottle.
To solve the problem, we applied methods of reflecting or transmitting light. When using white paper to reflect light and taking a picture of a plastic bottle, the intended reflection of light on the surface of the bottle did not occur. However, shadows caused by the aperture were formed in the image depending on the angle of the light source, and there were difficulties in ensuring reproducibility due to the difficulty of controlling external light conditions.
To ensure the reproducibility of the image, a cuboid the structure was created with paper, and the light was transmitted through it before taking a picture of a bottle inside the structure. This allowed the removal of light transmission on the surface of the bottle and minimized the effect of external lighting conditions, resulting in an image with reproducibility in brightness.
The best image quality was achieved using the transmission-based testbed described above. A plastic bottle with simulated defects was placed inside the testbed, and an 8-bit image was captured using a vision camera (Logitech, C930e) at a resolution of 1,080p, as shown in Fig. 2, to create the dataset. Each defect is represented on either the upper or lower part of the bottle. Specifically, there is one normal bottle and two bottles for each type of defect, resulting in a total of seven plastic bottles used in the study. Each bottle was rotated to capture 40 images, resulting in a total of 280 images. All captured images were horizontally flipped, doubling the total number of images.
3 Image Processing for Defect Diagnosis
Generally, the architecture of a convolutional neural network (CNN) based classification model is designed to perform repeated convolution and pooling operations at each layer, extracting feature maps from the image. Max-pooling is a commonly used pooling operation, where the operation outputs the maximum value within each kernel position of the feature map [11]. Applying max-pooling to a given feature map removes all pixel values in the kernel except for the maximum value, resulting in the remaining values being discarded in the next feature map. If a single-channel image is input to a convolutional neural network that has not been trained, the feature maps that pass through the last layer will only contain information about bright areas from the previous feature maps.
Meanwhile, due to the nature of plastic bottle images with a white background, the pixel values at the locations where defects occur are relatively lower than the surrounding areas. Due to the characteristics of max-pooling, as shown in Fig. 3(a), if an image of a plastic bottle with a white background is inputted into a convolutional neural network that has not been trained, information about defects (value of defective area) will be lost in the feature map. Therefore, it is more efficient to invert the colors of the image, as shown in Fig. 3(b), to prevent information loss of defective areas when training CNN-based classification models, rather than inputting the image without any image processing.
Defect diagnosis algorithm is designed to go through a sequence of image data processing steps, including color inversion, as shown in Fig. 4, and then be input to a CNN-based diagnostic model. Image resizing and grayscale converting reduced the computing resources required for the model to diagnose. Removing the top and bottom background parts of all images in the dataset prevented unnecessary image information from being input.
4 CNN-based Multi-Defect Diagnosis Model
Data processing sequence described above was applied to all image data, and the dataset for model training and testing was constructed as shown in Table 1. The dataset contains a total of 560 images, with 80 images of normal bottles and 480 images of bottles with defects.
Training and test datasets were split into equal ratio (0.2) for each defect state within each image. For example, 32 out of 160 black spot images were randomly selected for the testing dataset. As a result, the number of images in the training dataset was 448, and the number of images in the test dataset was 112. Additionally, considering the output of the classification model, one-hot encoding was applied to label the image data for training and testing.
The architecture of the CNN-based multi-defect diagnosis model is presented in Fig. 5. The classification model consists of three convolution and max-pooling layers and one dense layer. ReLU (Rectified Linear Unit) was used as the activation function for all convolution and dense layers to quickly extract feature maps. Activation function for the output layer was softmax, which calculates the probability of belonging to each class.
The number of filters in the convolution layers was increased by a factor of 2, and the stride of the max-pooling layers was set to 2. As a result, as the layers repeat, the resolution of the feature map is halved and the number of feature maps doubles, and as the layers repeat, more small-resolution features are extracted.
After designing the architecture, we searched for the optimal model that fits the structure by setting three hyperparameters to the values shown in Table 2 and comparing the training and validation accuracies for all configurations. To prevent overfitting during training, early-stopping technique was applied to stop the training if the validation accuracy did not improve for a certain number of epochs (The validation data was randomly selected from 20% of the training data) [12]. The cost function for the output layer was cross-entropy, and the optimization algorithm used was ADAM [13].
Based on the conditions described above, 27 models were trained, and the model trained with a kernel size of (3,3), number of neurons of 18, and learning rate of 0.005 had the highest validation accuracy, so it was selected as the final architecture of the model. The types and ranges of the hyperparameters were determined empirically, considering the model’s convergence and the time required for training. Therefore, it is unlikely that the selected model is globally optimal for the given dataset. In other words, there may exist hyperparameters outside the explored range that yield better performance. Note that the results based on the grid search model selection may vary depending on the composition of the data within the randomly split training, validation, and test datasets
When tested with the test dataset, the diagnostic accuracy for the four classes was 91.96%, correctly diagnosing all images except for 9 out of 112. Fig. 6 shows the confusion matrix of the test results. In the case of normal images, all 16 images were correctly classified. However, for images with black spot, 2 out of 32 were misclassified as pearlescence and 1 as deformation. In the case of images with pearlescence, 1 out of 32 was misclassified as normal. For images with shape abnormality, 4 out of 32 were misclassified as pearlescence and 1 as black spot.
Examining the 9 misclassified images, it was found that in most cases, the defects were difficult to identify even with the naked eye due to the angle at which the image was taken, as shown in Fig. 7. These results suggest that it is difficult to make a complete defect diagnosis with images taken from only one direction. Overall, the diagnosis model showed high accuracy, but additional improvements in accuracy are needed for images with black spot and shape abnormality.
To evaluate how effective the color inversion preprocessing presented in Chapter 3 was in the overall defect diagnosis algorithm, we trained and tested the image dataset without the color inversion preprocessing using the same model structure as shown in Fig. 5. We conducted the hyperparameter settings, dataset split for training, validation, and test identically, and evaluated the performance. The diagnostic accuracy was found to be 66.07%. Compared to testing with the dataset after color inversion preprocessing, the accuracy was significantly lower. Therefore, it can be inferred that the color inversion preprocessing leads to improved diagnostic accuracy when training the defect diagnosis model.
5 Conclusion
In this study, a multi-defect diagnosis algorithm using transparent plastic bottle image dataset was developed by combining simple image processing and CNN algorithm. First, a testbed using transmitted light was constructed for imaging plastic bottles. Three types of defects in plastic bottles were selected: black spot, pearlescence, and shape abnormality, and image dataset was constructed by simulating defects on normal products and capturing them.
Particularly, color inversion processing method was proposed as an image processing technique based on the max-pooling operation within the CNN. After establishing the image dataset processing pipeline for the entire training and testing, the structure of the CNN-based multi-defect diagnosis model was designed, and the final model was selected through grid search. The final defect diagnosis model achieved a test accuracy of 91.96% in the 4-class classification. To confirm the effectiveness of the color inversion processing, a model trained under the same conditions without color inversion processing was tested, and it showed significantly lower accuracy. This confirmed effective learning of the CNN-based diagnosis model through color inversion processing of images.
However, it was found that if the defect is not visible in the image, diagnosis is difficult with the algorithm presented in this study. Additionally, as only images were extracted and defects were diagnosed from a single plastic bottle shape, there is a limitation in diagnosing defects if the shape of the bottle varies.
Additionally, the diagnostic accuracy of the final model presented in this study is slightly above 90%, which may be insufficient for application in defect inspection in actual industrial settings. However, this research focused on methods to enhance the training efficiency of CNN-based deep learning inspection models. The approaches proposed here can be readily utilized when training inspection models based on more advanced models in the future.
In future research, to enhance the applicability of the model presented in this study, the dataset should include a greater variety of bottles with different shapes and sizes, and methods such as increasing the number of shooting angles for the bottles should be implemented simultaneously. Alternatively, to specify the location of defects, the dataset could be restructured, allowing for the application of algorithms such as instance segmentation or object detection.