IDeepify

IDeepify a full end-to-end deep learning-based KYC (Know Your Customer) product. It consisted of multiple stages:

  • Face detection: faces are detected in the verification document (government ID) and in a live selfie, cropped, and then sent to the face matching component.
  • Face matching: the two detected faces are passed through a set of fine-tuned pretrained facial embedding networks, then through a set of distance metrics that are then fed into a classifier. This managed to achieve 97% accuracy on FaceScrub dataset.
  • ID localisation and segmentation: an attention-based segmentation network was trained on a collected tiny dataset (less than 10 samples) that was augmented with synthetic data. The network is used to segment the verification ID from the background. The ID's four corners are then extracted and used to run a perspective transform to prepare it for text extraction.
  • Text segmentation and extraction: The ID is then passed into another attention-based segmentation network that was fully trained on synthetic data. The network segments the text from the background to make it easier for the OCR step.
  • Arabic OCR: at the time when this model was being trained, no publicly available Arabic OCR solution was robust enough for production use. This network was trained completely on synthetic data to extract Arabic letters and numbers. A last validity and correction check was implemented to ensure the OCR results are consistent.

  • Below is shown some results of the other mentioned steps.

    This video shows real-time ID localization and text segmentation with different angles, backgrounds and location of the ID in the picture.


    Implementation details can be discussed upon request.