Gen 1: Background Modelling (before the age of AI)
Our first video analytics module to detect unattended items uses dual background texture models of the input video frames. The two texture models are updated at different speeds. The slow model contains the average video frame before the item appears. The fast model corresponds to the average frame after the item appears while filtering moving objects such as people and vehicles. Unattended items are located by analyzing the difference between the two models.
The Gen 1 detector had a limited application in sterile zones without frequent lighting changes. It generated too many false alerts in crowded indoor scenes or any outdoor environment.
Gen 2: Background Modelling with an AI-based Patch Classifier (2018-2019)
Gen 2 introduces a binary classifier based on a convolution neural network (CNN) to analyze the image patches found by the Gen 1 detector. We have trained the CNN using tens of thousands of "target" and "noise" images. As the result, the number of false alerts dropped more than 10 times. However, both Gen 1 and Gen 2 detectors suffer from a variable response time: because of the background model the high-contrast items generate alerts much faster than low-contrast items. It is not possible to guarantee a response period following the item appearance. Also, Gen 1 and Gen 2 background models were not stable in highly crowded scenes, where one can see more people than the floor surface.
Gen 3: Native AI-based Unattended Item Detector (2020 - present)
Our Gen 3 detector uses no background models. It entire video frame is processed by a CNN to detect known items such as bags and boxes without people nearby. We have used hundreds of thousands of images in the training dataset.
The detector may generate individual false alarms on floor signs and rubbish bins, but repeated detections are suppressed.
Unlike Gen 1 and Gen 2, the new detector can operate in a very busy environment and can be adjusted to different response intervals (e.g. 30 seconds). The Gen 3 detector is currently used on a production scale across more than 10 000 cameras in some of the world's largest metros.