It’s been a long time since my last blog post.
I’ve been quite busy recently, playing around with Go, Haskell, Alex parser, Lex lexer, home-made Rubber Duckies and plenty of other interesting stuff I may blog about at some point.
This post is about setting up a home face recognition system using an IP Camera, OpenCV and a raspberry pi.
I won’t focus on the face recognition part that much since the code I used was not written by me but instead was sourced from this awesome blog post that I highly recommend reading if you’re into ML with python.
My post will be mostly about glueing together the OpenCV with my IP Camera and making the latter read it’s input from a RTSP source which apperently works if you have the right OpenCV (more on this later).
Aim of project
In my home automation journey, after setting up an audio system next thing on the list is to control the music played in some way that’s simple (basically making a home-made alexa thingy).
We spend a lot of time in our living room listening to music. Unfortunately I didn’t simplify the audio system enough so that less tech-savvy people (Windows guys) could use it so I wondered - wouldn’t it be cool if every time someone enters the room, he would be recognized by a camera a play some music based on their taste?
Hell yeah, sounds like a fun pre-exam project to do while procrastinating revision right?
Doing reading on ML and face detection and recognition
Image processing has become quite a trending topic recently which is probably the reason why there is such an abundance of libraries for it.
Unfortunately (for me) most are written in C++ and people who know me know that I’m not a fan of writing excessive amounts of code for simple tasks (such as face detection lol).
Happily, there is python for the lazy people and what’s even better is that most of the C++ libraries (dlib, OpenCV, etc) have their Python implementations. Yay!
Even though, I was genuinely interested in the topic so I did some reading on using opencv with python. This blog post was quite useful and shows the 101 of OpenCV with Python (loading images, drawing stuff onto images etc.).
The main source for my code was mentioned previously. Here’s the link again because it’s so great - do read the blog. They describe how the code they’ve posted works and how to train the support vector machine for your face from new images. As a bonus, they even added face recognition using a video source!
So… if the code is online what did I do exactly?
Getting the the code to work is the easy part.
That’s fair and square, however, it’s using my laptop camera.
Next thing is to make it read the RTSP stream which shouldn’t be too hard right?
According to StackOverflow OpenCV does support RTSP, however, this was not the case for my setup.
It is possible that I’ve done something wrong so if anyone figures out how to make it work I’ll get him a beer.
recognize_video.py it reaches the point of reading from the camera and after a little timeout it errors out:
Yes, I’ve triple-checked the URI so the issue lies somewhere else.
My build info that DOESN’T allow OpenCV to read from a RTSP source
Here’s my setup info if anyone fancies a try to debug with me:
fedora 30 with latest updates python --version -> python 3.7.3 cv2.__version__ -> 4.1.0 cv2.getBuildConfiguration() (Video I/O section) -> DC1394: NO FFMPEG: YES avcodec: YES (58.47.106) avformat: YES (58.26.101) avutil: YES (56.26.100) swscale: YES (5.4.100) avresample: NO GStreamer: NO v4l/v4l2: YES (linux/videodev2.h)
The reason might be because of the missing
GStreamer option, even though I tried compiling it manually with that option included - still didn’t work.
Planning a workaround
I did look into a few libraries like this one, this one and a some others but none of them seemed to do the trick and read from the IP Cam. Furthermore, I had my doubts that even if I managed to read a frame from it, it would still go wrong when passing it to OpenCV (image format, pixel format etc, FPS, etc.).
Before jumping on me saying I’m an idiot and got the URI wrong, that’s not the case - adequate tools like VLC, FFMpeg and similar media playing software did read the stream correctly.
So the 2 options were either to keep looking for a library that manages to read the RTSP, or do something hacky-er - I could read from a local web cam, so why not make the IP camera look like a local camera? After all if powerful tools such as FFMpeg can read from it, surely they can do other magic as well.
First step is to make another
/dev/video device so that I could stream to it.
To admit, my first attempt was waaay off target:
$ sudo touch /dev/video5
Sure, I could write to it, but it was nowhere near a capture device that I could read from afterwards. I did try some bash-fu to make OpenCV read from a constantly changing file but couldn’t manage to trick it to.
After a bit of google-ing on character devices and block devices, soon enough I arrived at the
Turns out that
/dev/videoX devices as well as some other such as
/dev/random etc. are all character devices.
The command to make one and let the kernel know it should treat it as a camera device is:
$ sudo mknod test_cam c 81 0
c tells it’s a character device,
0 tell the kernel what modules to use for that specific device (list of all device major minor numbers - 81 is char device, the 0 is
file on the new file confirms it is a character device:
$ file test_cam test_cam: character special (81/0)
That’s cool! Now I have a virtual camera as a file on my hard disk!
Let’s see how to write to it now.
Fighting character devices
My initial idea was to basically do something similar to
sudo cat video.mp4 > test_cam and afterwards read from
Well surprise, surprise writing to character devices isn’t that straightforward.
[root@yuhu]: cat kek.mp4 > test_cam cat: write error: Invalid argument [root@yuhu]: echo 1 > test_cam -bash: echo: write error: Invalid argument
They expect data in characters as opposed to blocks with which we are used to. So outputting a file into a character devices wouldn’t make much sense the way I was trying to do it. This answer explains it in greater detail.
I decided bothering with kernel IO would be too much for 1am so I started looking elsewhere.
At some point it started to get a bit depressing since forwarding a RTSP video source to a local virtual camera isn’t something people do every day and therefore not much is online about how to go about it.
On the edge of despair (opening sites that I’ve already gone through) I found v4l2loopback module. Lord and saviour!
This is a kernel module that enables us to creating virtual video devices that normal video4linux2 (v4l2) applications can read as capture device, but also allows writing to it which is what I was after!
Next step was to start writing to it.
The tool of choice for me was the infamous ffmpeg which is like a Swiss knife for media. It can do all kind of crazy stuff like streaming the active X (Desktop) via network stream, or convert input/output media’s pixel formats, RGB values and many many other funky stuff.
Having hundreds of options is a two edged sword though, especially for someone who doesn’t understand in great detail how media is converted and all the different types of codecs and the differences between each. I spent the next few hours trying to figure out all the correct input/output options to stream the IP cam to my virtual one.
The main issue was that I didn’t understand much about how moving pictures (aka videos) are seen from the computers' point of view. There are quite a few moving parts that I had to basically brute-force to make the thing work since it either works or you see no picture whatsoever. There is no other state.
By the end of the night this was the best output I ever got:
You can see some silhouettes here and there so there was light at the end of the tunnel.
Finally, on the following day by continuing to tweak parameters of ffmpeg I finally had success reading the stream with the correct settings. Running the face recognition python app afterwards was as simple as changing the id it uses for the camera input.
This is me taking a photo with my phone of my laptop screen which is showing the output of the OpenCV face recognition app which gets its input from the virtual camera device which is getting its input from the video4linux loopback module which is being written to by ffmpeg’s RTSP input.
That ^^ described quite well what the goal result was.
Now I have a programmatic way to do whatever I want once a face is detected and even when a specific person is recognized.
The magic steps that made all this possible were:
- Firstly, install the
v4l2loopbackmodule and load it - it will create the virtual camera devices.
- Stream to the virtual camera (in my case
/dev/video2) using ffmpeg:
$ ffmpeg -i rtsp://USER:PASS@CAMERA_IP:10554/udp/av0_0 -f v4l2 -pix_fmt yuv420p /dev/video2
Finally change the source code of the application to use camera id 2 instead of 0 (default) and it magically works!
Next step is to put all this setup on the raspberry pi and run some scripts based on who the detected person is. Still have to gather all my housemates' consent but you know, people are easier to handle with than computers :D
Setup on the raspberry was tricky due to the limited resources.
pip install -r requirements.txt failed because of insufficient ram.
All 4 virtual cpus ran on 100% eating up all of the 1GB of memory causing the pi to render useless for the time being.
This meant I had to come up with an alternative way to pip install all the needed requirements. The main issue was the compilation of the Cython-related libraries like numpy and matplotlib.
pip install-ing them separately did the trick.
After installing the last few system packages like python-dev and the raspberry-kernel-headers for the video4linux2loobpack module everything worked!
The compilation of the latter was trouble-free and the pi started recognizing images.
The next issue was performance - with full FPS, the pi was struggling quite hard to deal with the incoming frames, process them and produce some output.
So a necessary hack was needed - I added an extra
sleep before rendering each frame, whose purpose is to drop the FPS and let the pi some air to breathe.
Now there was some tweaking to find the best balance between latency and performance, but soon enough I had a latency at around 1-2s for 50% load average which is reasonable.
Now obviously there are some privacy concerns with having an IP camera streaming all the time.
Excluding myself, it is reasonable to consider the opportunity of a third party watching the stream as well (I would be surprised if a chinese ip camera wasn’t monitored).
It is disturbing to know that someone might be watching on the other side so I took some precautionary actions to ensure that myself and noone else is watching that live feed.
I simply put the camera to a network without an uplink and connected the pi’s wlan interface to it. Also made sure that the pi is dropping everything from the camera’s ip address except for the RTSP traffic.
So this is what the final setup looks like network-wise:
Yes, I could have made it simpler with a simple SOHO router running OpenWRT, unfortunately the devices I had weren’t supported so I had to physically block the camera’s internet access.
Currently, I am the only person that can be recognized by the application, and once it does see me, it says hi to me which is cute. I have to add some more pictures of myself and some other people to improve the accuracy of the SVM and that’s pretty much it.
P.S: Oh btw, it recognizes people in full darkness as well using it’s IR camera which is pretty cool!