Combined FFmpeg, openCV, dlib and SciKit into one face recognition component using CUDA

rafale77 · May 13, 2021, 3:53pm

Something is definitely wrong here:

emb = self.face_detector.detect_align(pic, img, priors)[0]. Here, emb has Python Length: 512 Tensor Size: torch.Size([512])

The python length corresponding to this torch.Size output should be 1, not 512. For some odd reason, I am not seeing this at all. What version of python and pytorch are you running?

wmaker · May 13, 2021, 5:32pm

Here are various versions of things:
torch. __version__ is 1.8.0+cpu
torchvision. __version__ is 0.9.0+cpu
numpy. __version__ is 1.19.5
python --version is 3.6.9

I can try and build a newer environment and see what happens.

rafale77 · May 13, 2021, 5:35pm

Yeah, Try python 3.8, 3.6 is very old. I have not tested at all these codes on anything prior to python 3.7. I am less worried about pytorch.

wmaker · May 13, 2021, 8:29pm

Update…I built a new docker environment with the following:

>>> torch. __version__  =>'1.8.1+cpu'
>>> torchvision. __version__ => '0.9.1+cpu'
>>> numpy. __version__   => '1.20.2'
python3 --version  => Python 3.8.5

Unfortunately I got the same results.

As a check, do any of the other steps mentioned above (for 1 face versus 2 faces) look correct?

EDIT/UPDATE
For what its worth, I changed the line of code in step 6 by simply removing the [0].
This made the Python length =1 and it built a facebank. I then gave it a test image to match on to see if something else bombed, but it ran without failure and the output actually said if found my face.

rafale77 · May 13, 2021, 10:28pm

Thank you! It looks like it was a mistake on my part. I must have changed the model along the way and made the modification on the inference but not on the training part of the code.

wmaker · May 17, 2021, 9:42pm

I wanted to provide a little more feedback.
I’ve run a few inference test cases, and while the results look promising, the CPU based computation times are several tens of seconds. Time is mostly spent running the RetinaFace detection and it seems to be CPU bound (old i7 4 core).

As a check, the RetinaFace model file I used was the RetinaFaceJIT.pth which was on a google drive you pointed to.
You also referenced Biubug6’s resnet50. When you go to his google drive, this file is named Resnet50_Final.pth
Should I have used the Resnet50_Final.pth? and what are the differences?

rafale77 · May 17, 2021, 10:15pm

This might be more than you want to know but I will give it a shot.
The way pytorch works, you save the weights (parameters) of the pretrained model in these files and most of what you see on github are codes building the models based on a well known architecture (like resnet50) and then loading the weights onto them before inferring with it. The benefit is more for research as people can then take this model and either retrain it or train it to recognize other things with other datasets. The file I provided is the same pretrained models with weights which has been torch scripted for JIT inference (Just in time compilation). It gives the benefit not to have to rebuild the model from python code and to run potentially significantly faster as it uses torchscript just in time compiler to run. That being said, I don’t really recommend running any of these models on CPUs. There is a reason why “neural processors” and “edge accelerators” exist and are getting more ubiquiteous: CPUs are extremely inefficient for this type of workload.