Object Detection in Live Video: Using The ODROID-XU4 With GStreamer

Deep learning has become an important topic in the past years, and many companies have invested in deep learning neural networks, either in terms of software or hardware. One of the most used field for deep learning has become object detection - for example, the photos taken with our phones are not automatically classified in categories using deep learning object detection.

In this article, we investigate a new use for the ODROID-XU4: creating a smart security camera that is able to detect objects of interest in the camera feed on-the-fly and act accordingly. We will be using the dnn module of OpenCV to load a a pre-trained object detection network based on the MobileNets Single Shot Detector. The article was inspired by an excellent introductory series on object detection by Adrian Rosebrock on his blog, PyImageSearch. In Adrian’s tests, a low-power SBC such as the raspberry pi was not even able to achieve 1fps when doing real-time detection. Accordingly, I will not cover the basics of object detection and OpenCV, which you can read about in his posts, but instead focus on optimizations for the ODROID-XU4 SBC in order to achieve the highest real-time detection framerate for a live stream.

CPU vs GPU

The first thing to determine is if the ODROID GPU can help speed up detection by using OpenCL. ARM maintains the ARM Compute Library, an optimized vision and machine learning library for the Mali GPUs. However, my findings are that the quad-core 2Ghz A15 cores provide a much better performance than the 6-core 700Mhz Mali GPU for the ODROID-XU4. You can read more about these results on the forum postat https://forum.odroid.com/viewtopic.php?f=95&t=28177.

In my tests, using all 8 cores is also detrimental, since ARM little cores will slow down overall detection time. To make sure we are using only the powerful A15 cores, we need to run our detection program using taskset 0xF0. Adequate cooling is also recommended to maintain top frequency on the A15 cores.

OpenCV optimizations

Next, we want to compile the latest version of OpenCV, which provides a deep learning module, and optimize it for the ODROID-XU4. For this, we update the CPU_NEON_FLAGS_ON in cmake/OpenCVCompilerOptimizations.cmake to use -mfpu=neon-vfpv4 instead of -mfpu=neon, enable Threading Building Blocks (TBB) with the flags -DWITH_TBB=ON -DCMAKE_CXX_FLAGS="-DTBB_USE_GCC_BUILTINS=1" and make sure the following compile flags are used: -mcpu=cortex-a15.cortex-a7 -mfpu=neon-vfpv4 -ftree-vectorize -mfloat-abi=hard by setting C_FLAGS, CXX_FLAGS, -DOPENCV_EXTRA_C_FLAGS and -DOPENCV_EXTRA_CXX_FLAGS. We also need to make sure GStreamer library is available to OpenCV by using the flag -DWITH_GSTREAMER=ON. Prebuilt Ubuntu 18.04 packages for OpenCV and GStreamer are available from my repository at https://oph.mdrjr.net/memeka/bionic/.

With only CPU and OpenCV optimizations, we can already achieve 3fps using the same code that run on Raspberry Pi obtains only ~0.9fps. But let’s try and do better.

GStreamer

Instead of using OpenCV to connect to the camera, we can use instead GStreamer. This allows us several things: connect to wireless cameras on the network, use the ODROID hardware decoder, hardware encoder, and hardware scaler. We can use the hardware decoder to process H264 from a live stream or from a H264 camera, the hardware scaler to change image resolution and pixel format, and the encoder to output a H264 encoded stream, either to save in a file, or to stream. It showed also a small overall performance improvement. Some example GStreamer pipelines are:

Connect to H264 stream from camera:

$ v4l2src device=/dev/video1 do-timestamp=true ! video/x-h264, width=1280, height=720, framerate=15/1 ! v4l2h264dec ! v4l2video20convert ! appsink
Connect to MJPEG/YUV stream from camera:
$ v4l2src device=/dev/video0 do-timestamp=true ! video/x-raw, width=1280, height=720, framerate=15/1 ! v4l2video20convert ! appsink
Save output to mp4 file:
$ appsrc ! videoconvert ! v4l2h264enc extra-controls=\"encode,frame_level_rate_control_enable=1,video_bitrate=8380416\" ! h264parse ! mp4mux ! filesink location=detected.mp4
Stream output on the web with HLS:
$ appsrc ! videoconvert ! v4l2h264enc extra-controls=\"encode,frame_level_rate_control_enable=1,video_bitrate=8380416\" ! h264parse ! mpegtsmux ! hlssink max-files=8 playlist-root=\"http://0.0.0.0/hls\" playlist-location=\"/var/www/html/hls/stream0.m3u8\" location=\"/var/www/html/hls/fragment%06d.ts\" target-duration=30

Multithreaded batch processing

With these improvements and a multi-threaded model where fetching the next frame runs independent in a different thread of the object detection, ODROID-XU4 is able to achieve up to 4fps: in one second, it can detect objects in 4 images. Since detection is the main objective, 4fps is actually enough to alert us on objects of interest. So we can have an input stream with higher framerate, and selectively select frames for object detection.

To maintain the illusion that each frame is processed, we do a simple trick: when an object is detected, we highlight its position both in the frame processed, and in the subsequent frames until the next detection. The position will lose accuracy when the object moves, but since we are capable or processing up to 4fps, the error will be quite small. We use a queue to read frames from the input stream, and process n frames at once time: first frame is used for detection, and subsequent processing is done for all n frames based on the objects detected on first frame. We choose n, the size of the batch, as a function of the input stream frame-rate, and the processing capabilities of the ODROID-XU4.

For example, for an input with 15fps, we can use n=4 (run detection for 1 in 4 frames) to maximize utilization. The code in Python for this is quite simple:

# function to read frames and put them in queue
def read_frames(stream, queue):
global detect
while detect is True:
(err, frame) = stream.read()
queue.appendleft(frame)

# start reader thread
detect = True
reader = threading.Thread(name='reader', target=read_frames, args=(vin, queue,))
reader.start()

# grab a batch of frames from the threaded video stream
frames = []
for f in range(n):
while not queue:
# wait for n frames to arrive
time.sleep(0.01)
frames.append(queue.pop())
frame_detect = frames[0]

Objects of interest

We define the objects of interest from the classes the MobileNets SSD can detect. These classes include “person”, “bird”, “cat”, “dog”, “bicycle”, “car”, etc. We want to be able to assign different detection confidence levels for each object, and also a timeout for detection: e.g. in the same object of interest is detected in the next frame processed, we don’t want to receive a new notification (i.e. we don’t want to get 4 emails each second); instead we use a timeout value, and we get a new notification when the timeout expires. The code in Python is:

# check if it's an object we are interested in
# and if confidence is within the desired levels
timestamp = datetime.datetime.now()
detection_event = False
if prediction in config['detect_classes']:
if not confidence > (float)(config['detect_classes'][prediction]):
# confidence too low for desired object
continue
else:
# we detected something we are interested in
# so we execute action associated with event
# but only if the object class was not already detected recently
if prediction in DETECTIONS:
prev_timestamp = DETECTIONS[prediction]
duration = (timestamp - prev_timestamp).total_seconds()
if duration > (float)(config['detect_timeout']):
# detection event (elapsed timestamp)
detection_event = True
else:
# detection event (first occurence)
detection_event = True
else:
if not confidence > (float)(config['base_confidence']):
# confidence too low for object
continue

Detection events and outputs

Lastly, we want to have two separate actions taken after a frame is processed: first action is independent of detection results, whereas the second action is taken only when the objects of interest are detected. In my example code, all frames are modified by having a box and a label around all detected objects (of interest or not). These frames are then saved in the output stream, which can be streaming video. Thus, when connecting remotely to the security feed, you can see the processed video instead, which includes color-coded squares around moving objects.

When objects of interest are detected, the frame is also saved as a jpeg file, and is made available to a user-defined script that is responsible for notifying the user. For example, the picture can be sent via email, used with IFTTT or sent directly to the user’s mobile phone.

The full example code is available at https://gist.github.com/mihailescu2m/d984d9fe3e3937573456c2b0423b4be9 and the configuration file in json format is at https://gist.github.com/mihailescu2m/42fdccd624dc91bb9e04b3adc39bc50f

Resources

https://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/ https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/ https://www.pyimagesearch.com/2017/10/16/raspberry-pi-deep-learning-object-detection-with-opencv/

Be the first to comment

Leave a Reply