CNIC Image Fields Detection and Identification through YOLOv8
Automating identity verification has become a key requirement for various industries nowadays, ranging from finance to government services. One common task is extracting structured information from national identity cards, such as CNIC images. In this blog post, the approach of leveraging YOLOv8 (a state-of-the-art object detection model) is shared, through which specific fields (like Name, CNIC number, and Dates) are detected and extracted from scanned CNIC images.
What is YOLOv8?
YOLOv8 is the latest version of the YOLO (You Only Look Once) object detection model developed by Ultralytics. It offers faster performance, improved accuracy, and a more streamlined architecture compared to its predecessors. It has built-in support for tasks like detection, segmentation, and classification. Its lightweight variants make it ideal for edge devices, while still delivering strong results on complex datasets.
How to extract CNIC data using YOLOv8?
The objective is to automatically detect specific fields in a CNIC image, such as Name, CNIC Number, and Date of Birth, and accurately extract the text written under those fields. This involves identifying the correct regions on the CNIC and applying OCR to extract structured data reliably.
Steps to Extract CNIC Data
A complete pipeline was developed to detect and extract structured fields from CNIC images. The process began with dataset collection and annotation using Roboflow, followed by splitting the data into training, validation, and test sets. YOLOv8 was then fine-tuned on the annotated images to detect key fields such as Name and CNIC Number. After training, inference was performed to obtain bounding boxes and labels, which were used to crop regions of interest. OCR (using Tesseract) was subsequently applied to extract text from each cropped field. Through this end-to-end approach, accurate field-level text extraction from scanned CNICs was achieved.
We chose YOLOv8 due to its:
⦁ High speed-to-accuracy ratio
⦁ Improved performance on small text regions
⦁ Compatibility with diverse resolutions
⦁ Lightweight architecture (especially the yolov8n variant) for mobile and edge deployment
⦁ Well-maintained ecosystem and active community support
All of these make YOLOv8 an excellent choice for real-time object detection tasks such as CNIC field extraction.
Steps required:
Eight steps are required to perform this process successfully.
⦁ Dataset Extraction ⦁ Dataset Annotation ⦁ Dataset Splitting ⦁ YOLO Model Loading ⦁ Fine-tuning YOLO Model ⦁ Saving/Exporting Model ⦁ Inference on Final Model ⦁ Text Extraction
As the YOLOv8 model is trained on COCO dataset, it was not trained on CNIC dataset, hence we need to custom train it for our own project.
1. Dataset Extraction
To begin with, a dataset containing CNIC images was needed. After exploring platforms such as GitHub, Kaggle, and Roboflow, a relevant Pakistani CNIC dataset was found on Roboflow. If identity cards from a different country are being used, a dataset specific to that region can be created or sourced accordingly.
Note: The Roboflow dataset contained annotations for all text regions, but we created custom annotations tailored to my use case.
Using a custom dataset gives much better results if you can have it available. In this example, a dataset containing different CNIC images of Pakistani individuals was used. Along with that it also has annotations of all the text in the image, which was not very useful for our use case.
2. Dataset Annotation
Secondly, we needed to annotate the dataset to suit our project. We used Roboflow to manually annotate the images accordingly.
Annotating the dataset comprises of two steps, we first have to upload the dataset, then: ⦁ Draw Bounding Boxes ⦁ Label the Bounding Boxes
For each bounding box we draw, we label it with the respective field e.g. Name or Identity Number. This is an example annotation for the CNIC images highlighting each annotation with different colours.
The annotations are like this for each image:
The final format of each text field in annotation file will consist of:
⦁ Class Id number ⦁ Bounding Box coordinates(Box Center(x,y), and Box Width and Height)
An example format is as follows:
<class_id> <x_center> <y_center>
3. Dataset Splitting
After doing the annotations, and extracting the dataset to our local system, we will manually split the dataset into train,val and test folders. We can use the 70-20-10 rule for splitting the dataset, meaning 70% for train, 20% for validation, and 10% for testing. For this study we have organized images in:
⦁ images/train/ ⦁ images/val/ ⦁ images/test/
We created annotation files (YOLO format) in:
⦁ labels/train/ ⦁ labels/val/
This step is essential to later on fine-tune and train our YOLOv8 model.
Next we will also create a data.yaml file. A ‘data.yaml’ file contains the project directory and helps YOLO model navigate and extract the images and labels for the respective dataset.
4. YOLO Model Loading
Next we will import ultralytics package, and from ultralytics package we will import YOLO.
pip install ultralytics from ultralytics import YOLO
After that we will use our specific model of the YOLOv8 nano version (yolov8n). You can use any other version as well depending on your scope of work and requirements.
YOLOv8 has several versions (like n, s, m, l, x) designed for different performance needs. Pick the one that suits your project best. You can read more about them here.
model = YOLO("yolov8n.pt")
5. Fine-tuning YOLO Model
Now we will load the ‘data.yaml’ for training please change the path if yaml is at some other place in your case. model.train(data='/content/drive/MyDrive/My First Project.v1i.yolov8/data.yaml', epochs=50, imgsz=640, batch=16)
You can adjust the number of epochs accordingly,increasing epochs for better results, keeping hardware constraints in mind.
This will effectively fine-tune the YOLO model on our personalized dataset.
6. Saving/Exporting Model
yolo_model=model.export(format='pb')
This will essentially export and save the fine-tuned model on our local system. This ensures we can use our model later on for multiple use cases.
7. Inference on Final Model
results = model('/content/drive/MyDrive/ID_CARD_TEXT.v3i.yolov8/test', stream=True)
This will save the inference in the results variable. Each prediction in results will contain: ⦁ Bounding box coordinates ⦁ Class labels ⦁ Confidence scores
8. Text Extraction(OCR)
For text extraction or OCR, we can use any engine like Pytesseract or EasyOCR depending on higher accuracy on our respective dataset. As we have the coordinates and labels saved in the ‘results’ variable for each of the predictions, we technically have the labels mapped to the individual fields, a thing which was not possible before fine-tuning, and is possible through the YOLO model.
So for each image we can use the coordinates to crop the image with the coordinates and push the image in the OCR engine.
import pytesseract from PIL import Image for box in boxes: x1, y1, x2, y2 = map(int, box.xyxy[0]) crop = image[y1:y2, x1:x2] text = pytesseract.image_to_string(crop, config='--psm 6') print(f"{model.names[cls_id]}: {text.strip()}") --psm 6 was ideal for single-line text.
#You can clean the output to remove unwanted characters using post-processing techniques (e.g., regex, string cleaning).