Modern Face Detection based on Deep Learning using Python and Mxnet

Modern Face Detection based on Deep Learning using #Python and Mxnet

  • Modern Face Detection based on Deep Learning using Python and MxnetModern Face Detection based on Deep Learning using Python and Mxnet by WassaIn this post, we’ll discuss and illustrate a fast and robust method for face detection using Python and Mxnet.
  • In the following, we will show a robust and fast face detection that can be used for counting people in crowd.MxnetMxnet is a flexible and efficient framework for deep learning applications.
  • To run our example, we need to install OpenCV, Mxnet and their binding in python3.Explanation for OpenCV installation linkExplanation for Mxnet installation linkDocker InstallationIf you can’t install Mxnet/OpenCV on your computer or if you fail to install them and if you know Docker, we have created Docker Images with OpenCV and Mxnet installed.First, if you don’t already have Docker installed, the official method is easy to use and well documented (link).
  • This is my personal cheat sheet for OpenCV and other libraries installation.In your working directory, insert the following command:git clone cd DockerFiles make mxnetThese lines will download our DockerFiles and generate Docker images containing python3, OpenCV and Mxnet.Third step, creates the container and enters in it.
  • We still need to add the code to feed images in the face detector.Processing codeIf you use the Docker version, you can’t use windows output so you’ll only write the result in a file.

In this post, we’ll discuss and illustrate a fast and robust method for face detection using Python and Mxnet. At Wassa, some of our products rely on face detection. For example, in Facelytics, it is…

@PythonEggs: Modern Face Detection based on Deep Learning using #Python and Mxnet

In this post, we’ll discuss and illustrate a fast and robust method for face detection using Python and Mxnet. At Wassa, some of our products rely on face detection. For example, in Facelytics, it is included for face attribute extraction like gender, age and so. It can also be used for face recognition, emotion recognition, some augmented reality applications or people counting. Depending on the purpose, robust real-time face detector may be needed. Take into account people counting, it may require a fast and robust detector which can detect faces from different angles… We define two kinds of applications that need almost opposite quality. Counting people needs a fast and robust detector which can detect faces in different conditions. In contrary, attributes extraction hasn’t a good accuracy if the detected face offers too much angle. In the following, we will show a robust and fast face detection that can be used for counting people in crowd.

Mxnet is a flexible and efficient framework for deep learning applications. It has interface in lots of languages for execution and training (C/C++, Python, R, etc…). It also supports a compact and limited API allowing only execution for embedded application on smart devices. The community is active and friendly. Like Tensorflow supported by Google or Torch supported by Facebook, Mxnet is supported by Baidu and recently by Amazon.

One of the most used method for face detection has been the Viola Jones method for many years. This detection uses a cascade of haar classifiers. An advantage of this method is its implementation in OpenCV. Thanks its availability, it is included in various tutorials demonstrating face detection. This method has been the reference in face detection since 2001 but computer vision has evolved a lot since then and there are new methods which outperform the Viola Jones algorithm. An accurate catalog of these methods is listed by FDDB (the Face Detection Data Set and Benchmark). On this list, a large part of top algorithms use deep learning method. Some of them are accurate and relatively slow and other try to be as fast as possible.

In this tutorial, we will use a Mxnet implementation of the MTCNN algorithm designed by Zhang. This implementation can be found on GitHub link.

The MTCNN algorithm works in three steps and use one neural network for each. The first part is a proposal network. It will predict potential face positions and their bounding boxes like an attention network in Faster R-CNN. The result of this step is a large number of face detections and lots of false detections. The second part uses images and outputs of the first prediction. It makes a refinement of the result to eliminate most of false detections and aggregate bounding boxes. The last part refines even more the predictions and adds facial landmarks predictions (in the original MTCNN implementation).

The installation step can be the main issue of lots of these deep learning codes. To run our example, we need to install OpenCV, Mxnet and their binding in python3.

If you can’t install Mxnet/OpenCV on your computer or if you fail to install them and if you know Docker, we have created Docker Images with OpenCV and Mxnet installed.

First, if you don’t already have Docker installed, the official method is easy to use and well documented (link).

The second step is to clone our repo with some DockerFiles that will automatically generate the right Docker image. This is my personal cheat sheet for OpenCV and other libraries installation.

In your working directory, insert the following command:

These lines will download our DockerFiles and generate Docker images containing python3, OpenCV and Mxnet.

Third step, creates the container and enters in it. Go in your working directory and enter the following command:

If the installation worked correctly, you will obtain a command like this one:

you will see your working directory that only contain the DockerFiles folder for the moment:

Type these following commands inside the container to check if the installation worked. It will show Mxnet and OpenCV versions:

python3 -c ‘import cv2; print(cv2.__version__)’

python3 -c ‘import mxnet as mx; print(mx.__version__)’

Right now, when these lines are written (9th May 2017), we get Mxnet 0.9.5 and OpenCV 3.2.0. Newer version will be fine too. We didn’t test this code with older versions. OpenCV version upper than 3.0 may work. For Mxnet, it’s more complex. It’s an active framework and differences between versions can break some functionalities, mainly operators that didn’t exist before. The API we use has changed in version 0.9.3 so older versions will not work.

If you don’t get any error, the installation should be fine and you can move forward.

The Mxnet implementation of MTCNN comes with pre-trained models. We get them by cloning their GitHub repository. For this tutorial and to keep using python 3 instead of python 2 because of this, we fork the Xuan Lin repository to switch the python compatibility of the prediction. From the container, type the following command:

You must now have the following tree:

│ ├── cpuBoth show equivalent calculation time.

The code complexity for running prediction with neural network is low in most cases. In fact, it depends if all the processes are run through the network or if we need a part outside. For example, the implementation of image classification using VGG16 needs only two lines, one for loading the model, one to pass the image in the model. The MTCNN algorithm combines 3 neural networks, a large part of the logic is in this concatenation. But thanks to the author, the code is clear and usable almost as it is. All the code, for this illustration is available here.

In this example, we’ll use default parameters as follow.

contains the layer structure of the network. This second file is optional if we recreate the structure in python code.

(context) parameter. It defines on which device we run the code (CPU, GPU).

. It creates detectors from raw network.

Now all the deep learning part is set up. We still need to add the code to feed images in the face detector.

that will show in real time the annotated videos. The final code can be found here or here.

The code can be run from python REPL:

or from the command line if you got the complete code here with:

in your working directory. We did tests on several devices with/without optimization on CPU and on GPU.

We tested with both webcam and IP camera. Both show equivalent calculation time. This face detection is fast with 2 frames processed per second without any optimization. And process streams near real time with NNPACK or real time with a GPU.

The goal was to show that deep learning algorithm can be fast and accurate even on CPU. With this example, we got a face detection in real-time that works on CPU and GPU. This method offers good performances and is one of the best face detection on FDDB. Deep learning can be easy to use and powerful.

Find us on:

Modern Face Detection based on Deep Learning using Python and Mxnet

You might also like More from author

Comments are closed, but trackbacks and pingbacks are open.