Skip to content

Training Details (v1)

This page provides an overview of the training process and performance benchmarks for the v1 model.

Training Process

The model was developed using a two-stage training approach:

1. Pretraining

The model was pretrained on approximately 1 million synthetic images generated with cleaned and filtered text:

  • 60% Anime: Specialized corpus (non-public)
  • 20% Webnovel: Japanese webnovel text
  • 20% CC100: CC100 Japanese dataset

2. Fine-Tuning

The model was fine-tuned on the Manga109s dataset using a random 90% split for training.

Model Conversion

The model was trained in PyTorch and converted to TFLite with AI Edge Torch, enabling high-performance inference on mobile and edge devices.

Benchmarks

The following results were achieved on a random 10% split of the Manga109s dataset:

  • Character Error Rate (CER): ~7.4%
  • Exact-Match Accuracy: ~73%

Comparison

The v1 model's performance is comparable to or exceeds PaddleOCR-VL-For-Manga (somehow, unless there's something I missed), which typically achieves:

  • CER: ~10%
  • Exact-Match Accuracy: ~70%

Known Issues

The model currently has some difficulty accurately recognizing:

  • English letters
  • Various punctuation marks

Released under the Apache License 2.0.