Best Comic Panel Detection

mosesb/best-comic-panel-detection

published Jun 2025 · updated Jun 2025

Best Comic Panel Detection is a detection model that identifies and localizes individual panels in comic book pages using YOLOv12x.

status

coming soon

API providers

downloads / mo

4.6K

license

apache-2.0

specs

Task	Object Detection (Comic Panel Detection)
Architecture	YOLOv12x (extra-large variant)
Licenses	MIT
Training Data	Custom Roboflow dataset (Custom-Workflow-3-Object-Detection-1)

about this model

mosesb/best-comic-panel-detection is a YOLOv12x object detection model fine-tuned to detect and localize individual panels on comic book pages. It outputs bounding boxes for each panel, enabling automated comic digitization, content extraction, and layout analysis.

Architecture and Training

The model uses the extra-large YOLOv12x architecture, fine-tuned via transfer learning from a COCO-pretrained checkpoint. Training was performed on a custom Roboflow dataset ("Custom-Workflow-3-Object-Detection-1") with a single class: "Comic Panel". Hyperparameters included 640x640 image size, batch size 16, AdamW optimizer (lr=0.002), and up to 200 epochs with early stopping patience of 100.

Performance

On the validation set, the model achieves near-perfect detection metrics:

Metric	Value	Description
mAP50	0.991	Mean average precision at IoU threshold 0.50
mAP50-95	0.985	Mean average precision averaged over IoU thresholds 0.50–0.95

The model correctly identifies panels across various sizes and layouts, as shown in validation predictions.

Validation predictions showing detected comic panel bounding boxes

Training and Evaluation Visualizations

Training and validation metrics over epochs are available, along with a confusion matrix and additional performance curves (F1, precision-recall, precision, recall) in the finetuning directory.

Confusion matrix for comic panel detection

Limitations

The model detects rectangular bounding boxes and may underperform on highly irregular or overlapping panel shapes.

best for

·Digitizing comic books into panel-by-panel format
·Extracting text or characters from individual panels
·Analyzing comic book layouts and artistic styles

FAQ

What is this model best for?

It is designed to detect and draw bounding boxes around individual panels in comic book pages, enabling structured digital reading, content extraction, and layout analysis.

How accurate is the model?

It achieves a mAP50 of 0.991 and mAP50-95 of 0.985 on its validation set, indicating near-perfect precision and recall for comic panel detection.

What architecture does it use?

It uses the YOLOv12x (extra-large) object detection architecture, fine-tuned from a COCO pre-trained checkpoint.

What license governs use of this model?

The model and associated training code are licensed under the MIT License, allowing free use, modification, and distribution.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image, and receive bounding box coordinates and confidence scores in the response.

not yet live

We're benchmarking and onboarding Best Comic Panel Detection as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo