We conducted a small study to evaluate Vision Language Models (VLMs) for structured information extraction from synthetically generated industrial product labels (IPLs). This application visualizes the results of the VLM Qwen3-VL-8B-Instruct. Please scroll down to view the individual results. We included few IPLs in the evaluation, all of which were generated using a synthetic data generation pipeline. However, there are fine differences among the IPLs: Some synthetic IPLs were printed and photographed using a standard smartphone camera at various distances and under different conditions, including perspective distortion, reflections, and varying lighting conditions. These samples were considered realistic. In addition, artificial print defects, motion blur, and random lighting variations were applied to some of the synthetic IPLs. The remaining synthetic IPLs were included in the evaluation without any modifications. In the “Evaluation” section for each IPL, you will find a note indicating whether a modification has been made. The complete dataset, available on Zenodo, contains more than 1,000 images with ground-truth annotations that can be used for training and evaluation purposes. The evaluation results for the small dataset are visualized here. From top to bottom, the application presents the prediction results of the VLM, along with an evaluation using common metrics such as Precision, Recall, and F1-score for each image. Green rectangles indicate correct predictions (true positives in a binary classification setting), while red rectangles indicate incorrect predictions (false negatives). At the bottom of the page, you will find the average Precision, Recall, and F1-Score across all images, as well as the average processing time per image and the total number of predicted fields. In the footer of the application, additional links are provided on the right-hand side to publicly available datasets and to the Master’s thesis by Jannes Nitzsche on the Qucosa Publication Server. Jannes Nitzsche achieved the results in the course of the master’s thesis during his employment at Telekom MMS (Telekom MMS | Experience Beyond Digital). They were carried out in the context of the European funding project IPCEI-CIS (IPCEI-CIS – 8ra). The thesis was supervised by Thomas Burghardt.
Predicted Industrial Product Label (IPL):
Evaluation result single image ():
Precision : %
Recall : %
F1-Score : %
Evaluation result all images:
Avg. Precision : %
Avg. Recall : %
Avg. F1-Score : %
Avg. Processing Time : s
Total Predicted Files :