|
# SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images |
|
<a href='https://arxiv.org/abs/2412.17606'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> <a href='https://omron-sinicx.github.io/SBSFiguresPage/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> |
|
|
|
The official PyTorch implementation for the following paper: |
|
> [**SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images**](https://arxiv.org/abs/2407.13555), |
|
> [Risa Shionoda](https://sites.google.com/view/risashinoda/home), [Kuniaki Saito](https://ksaito-ut.github.io/),[Shohei Tanaka](https://shohei-ta-ds7.github.io/),[Tosho Hirasawa](https://toshohirasawa.github.io/),[Yoshitaka Ushiku](https://yoshitakaushiku.net/index.html) |
|
|
|
## Abstract |
|
Building a large-scale figure QA dataset requires a considerable amount of work, from gathering and selecting figures to extracting attributes like text, numbers, and colors, and generating QAs. Although recent developments in LLMs have led to efforts to synthesize figures, most of these focus primarily on QA generation. Additionally, creating figures directly using LLMs often encounters issues such as code errors, similar-looking figures, and repetitive content in figures. To address this issue, we present SBSFigures (Stage-by-Stage Synthetic Figures), a dataset for pre-training figure QA. Our proposed pipeline enables the creation of chart figures with complete annotations of the visualized data and dense QA annotations without any manual annotation process. Our stage-by-stage pipeline makes it possible to create diverse topic and appearance figures efficiently while minimizing code errors. Our SBSFigures demonstrate a strong pre-training effect, making it possible to achieve efficient training with a limited amount of real-world chart data starting from our pre-trained weights. |
|
|
|
## Model |
|
We release four models through Hugging Face. |
|
Please refer to the [GitHub code] (https://github.com/omron-sinicx/SBSFigures) for the model usage. |
|
|
|
| Task | Model | Checkpoint Path | |
|
| ------| ------- | ------------- | |
|
| Pretrained | Donut| [omron-sinicx/sbsfigures-pretrain-donut](https://huggingface.co/omron-sinicx/sbsfigures-pretrain-donut) | |
|
| Fine-tuned (ChartQA) | Donut | [omron-sinicx/sbsfigures-chartqa-donut](https://huggingface.co/omron-sinicx/sbsfigures-chartqa-donut) | |
|
| Pretrained | Pix2Struct| [omron-sinicx/sbsfigures-pretrain-pix2struct](https://huggingface.co/omron-sinicx/sbsfigures-pretrain-pix2struct) | |
|
| Fine-tuned (ChartQA) |Pix2Struct| [omron-sinicx/sbsfigures-chartqa-pix2struct](https://huggingface.co/omron-sinicx/sbsfigures-chartqa-pix2struct) | |
|
|
|
|
|
# Citation |
|
If you find our work useful for your research, please consider citing our paper: |
|
|
|
```bibtex |
|
@article{shinoda2024sbsfigurespretrainingfigure, |
|
title={SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images}, |
|
author={Risa Shinoda and Kuniaki Saito and Shohei Tanaka and Tosho Hirasawa and Yoshitaka Ushiku}, |
|
year={2024}, |
|
journal={arXiv preprint arXiv:2412.17606}, |
|
url={https://arxiv.org/abs/2412.17606} |
|
} |
|
``` |