Making a Facial Recognition Algorithm Using IoT and AI

7 min readNov 30, 2022

To humans, facial recognition seems pretty easy. Babies as young as 4 months old can already recognize faces, even though their other visual processing areas are still being developed. However, for computer programs, it isn’t that easy.

History

Ever since the 1960s, scientists and engineers have been trying to make computers able to recognize faces from images. This was a very tedious process, involving manually selecting facial features, but it was mostly limited by the technology of the time.

Through the 80s and 90s, the techniques and technology improved, with the first real automated facial recognition using principle component analysis (PCA). This was much better than the manual methods of the 60s, but much was still left to be desired.

In the 2000s, more and higher resolution photos were able to be processed by these algorithms. Datasets of this time included much more information than ever before. This resulted in algorithms up to 100 times more accurate than those of the decade prior.

As we all know, facial recognition algorithms have progressed much further, with Apple’s FaceID (using the OverFeat architecture, which you can learn more about here) and Facebook’s famous DeepFace (which you can learn more about here) have reached accuracies and dataset sizes that wouldn’t even be dreamed of only 10 years before.

However, you don’t need to be a huge company like Apple or Facebook to develop your own realtime facial recognition algorithm. In fact, using Python and the OpenCV library, anyone can actually do this. The setup is not that complicated, so I encourage you to try this out if you’re interested.

The realtime facial recognition algorithm that you will build today!

Requirements

You will need 2 libraries:

Numpy (pip install numpy)
OpenCV (pip install opencv-python)

You will also need to install the file at this Github link. This is the actual classifier, and the code will not work without it. It’s from the official opencv Github page, so don’t worry.

Code Breakdown

First, we must import the libraries we will be using. We will use Numpy, OpenCV, and PIL. We import the os package because we will be saving the images and models to the disk.

import numpy as np
import cv2
from PIL import Image
import os

Then, we must define the network, which is a cascade classifier. A cascade classifier works by inputting lots of images with faces, and lots of images without faces. It then extracts features from the faces. However, not all of the features are actually relevant to the classification. Therefore, the “weak” classifiers (classifiers that use just one feature) are tested for wrong answers, and the final “strong” classifier is the sum of the best weak classifiers. This brings down the total amount of features (160 000 for a 24x24 image) to only the most important ones (6000 features in the paper published by Paul Viola and Michael Jones in 2001) to lead to the highest accuracy model without using unnecessary computational power. These features are applied on windows (small square areas) of the image.

To further increase efficiency, the features are put into stages of classifiers. The features of the first stage are applied on each window. If the region is detected to be a non-face region, it is discarded. This leads to only the faces being processed, and the other irrelevant data not being processed.

faceCascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') 
cam = cv2.VideoCapture(0)
cam.set(3,640) 
cam.set(4,480)

The specific classifier we are using is trained on images that show the front of the subject’s face, so it will work best when you are facing the camera. It works fine when looking up or down, but when looking to either side, it does not recognize a face.

The second line defines the camera that we will be using. Since it’s the only camera connected to the computer, I am using the camera with index 0. If you have multiple cameras connected, then you might need to play with this a bit.

The third and fourth lines define the resolution of images you will be taking. I don’t recommend setting it too high, as it might use too many system resources.

face_id = input('user id ')

count = 0
while(True):
    ret, img = cam.read()
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_detector.detectMultiScale(gray, 1.3, 5)
    for (x,y,w,h) in faces:
        cv2.rectangle(img, (x,y), (x+w,y+h), (255,0,0), 2)     
        count += 1

        cv2.imwrite("dataset/User." + str(face_id) + '.' +  
                    str(count) + ".jpg", gray[y:y+h,x:x+w])
        cv2.imshow('image', img)

    k = cv2.waitKey(100) & 0xff 
    if k == 27:
        break

cam.release()
cv2.destroyAllWindows()

This works by setting a user id, which will be different per person. It then captures a bunch of images and assigns it the user id. It saves a copy of the image in grayscale, which is used because of the lower computational requirements. In the video feed that pops up, it creates a rectangle around the face to show what is recognized as a face. k = 27 is the escape key, and when it is pressed, the code is stopped.

path = 'dataset'
recognizer = cv2.face.LBPHFaceRecognizer_create()

def getImagesAndLabels(path):
    imagePaths = [os.path.join(path,f) for f in os.listdir(path)]     
    faceSamples=[]
    ids = []
    for imagePath in imagePaths:
        PIL_img = Image.open(imagePath).convert('L') # grayscale
        img_numpy = np.array(PIL_img,'uint8')
        id = int(os.path.split(imagePath)[-1].split(".")[1])
        faces = face_detector.detectMultiScale(img_numpy)
        for (x,y,w,h) in faces:
            faceSamples.append(img_numpy[y:y+h,x:x+w])
            ids.append(id)
    return faceSamples,ids

faces,ids = getImagesAndLabels(path)
recognizer.train(faces, np.array(ids))
recognizer.write('trainer/trainer.yml')

This block of code retrieves the images that were saved and creates an array with these images. It then trains a recognizer to recognize the faces based on their ids, and saves the model.

recognizer.read('trainer/trainer.yml')
font = cv2.FONT_HERSHEY_SIMPLEX

id = 0
names = ['None', 'Aryan'] 

minW = 0.1*cam.get(3)
minH = 0.1*cam.get(4)
while True:
    ret, img = cam.read()
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    
    faces = faceCascade.detectMultiScale(gray, scaleFactor = 1.2, 
        minNeighbors = 5, minSize = (int(minW), int(minH)))
    for(x,y,w,h) in faces:
        cv2.rectangle(img, (x,y), (x+w,y+h), (0,255,0), 2)
        id, confidence = recognizer.predict(gray[y:y+h,x:x+w])
        
        # If confidence is less them 100 ==> "0" : perfect match 
        if (confidence < 100):
            id = names[id]
            confidence = "  {0}%".format(round(100 - confidence))
        else:
            id = "unknown"
            confidence = "  {0}%".format(round(100 - confidence))
        
        cv2.putText(img, str(id), (x+5,y-5), font, 1, (255,255,255), 2)
        cv2.putText(img, str(confidence), (x+5,y+h-5), font, 1, (255,255,0), 1)  
    
    cv2.imshow('camera',img) 
    k = cv2.waitKey(10) & 0xff # Press 'ESC' for exiting video
    if k == 27:
        break

cam.release()
cv2.destroyAllWindows()

This final block of code runs the program, resulting in the real-time facial recognition working on the camera feed. At the start, it relates the ids to names that you set. Since I trained the model with an id of 1, I set the name corresponding to id 1 as Aryan.

The next bit of code is exactly the same as the second block of code, as it is essentially doing the same thing, which is watching for faces and creating a rectangle around them. The difference though, is that it displays the name of the person as well as a confidence percentage. Just like last time, you can press the escape key to stop the program.

Final Result

The confidence is quite low, but that is because I only trained the model for about 30 seconds.

Possibilities

It’s obviously not the best model, but it is a great beginner project. This opens up so much possibilities. Advancements in IoT has led to cheaper and better microcontrollers and cameras. AI advancements have augmented this, making it possible for any beginner to create a working computer vision model without any prior coding experience.

Instead of paying $8 a month for Google’s Nest Aware system to automatically recognize whoever’s at your doorbell, you can create a simple setup with a Raspberry Pi and a cheap camera to get this functionality without a monthly subscription.

Governments have already used this technology combined with many cameras to identify suspects in criminal investigations by comparing the pictures to a database of IDs, passports, and more.

Want to use this technology a bit differently? You can play a game by using hand gestures to control your character. How cool is that! A similar model could be used to recognize any gesture to perform any task, so people with accessibility problems (such as people without one or both arms) could use gestures to be able to use the internet.

If you’re a bit more experienced with computer vision, then you can even try to make your own autonomous robot, using computer vision to recognize things like walls, stairs, etc. Obviously, this is much more complicated, and includes lots of non-computer vision parts, like the movement, but it’s a great project if you want to get into the autonomous vehicles space.

Conclusion

This is a great project for beginners to get into computer vision or IoT. The power of either of these fields is much more than people may assume. They’re also much easier to get into than I thought, and they’re great hobbies that can turn into jobs for you down the line.

If you want to learn more about the applications of computer vision and my project, check out my video.