pix2struct-base-table2html

Turn table images into HTML!

Demo app

Try the demo app which contains both table detection and recognition!

About

This model takes an image of a table and outputs HTML - the model parses the image and performs optical character recognition (OCR) and structure recognition to HTML format.

The model expects an image containing only a table. If the table is embedded in a document, first use a table detection model to extract it (e.g. Microsoft's Table Transformer model).

The model is finetuned from Pix2Struct base model using a max_patch_length of 1024 and max generation length of 1024. The max_patch_length should likely not be changed for inference but the generation length can be changed.

The model has been trained using two datasets: MMTab and PubTabNet.

Usage

Below is a complete example of loading the model and performing inference on an example table image (example from the MMTab dataset):

import torch
from transformers import AutoProcessor, Pix2StructForConditionalGeneration
from PIL import Image
import requests
from io import BytesIO

# Load model and processor
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained("KennethTM/pix2struct-base-table2html")
model = Pix2StructForConditionalGeneration.from_pretrained("KennethTM/pix2struct-base-table2html")
model.to(device)
model.eval()

# Load example image from URL
url = "https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content))

# Run model inference
encoding = processor(image, return_tensors="pt", max_patches=1024)
with torch.inference_mode():
    flattened_patches = encoding.pop("flattened_patches").to(device)
    attention_mask = encoding.pop("attention_mask").to(device)
    predictions = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_new_tokens=1024)

predictions_decoded = processor.tokenizer.batch_decode(predictions, skip_special_tokens=True)

# Show predictions as text
print(predictions_decoded[0])

Example image:

Model HTML output for example image:

<table border="1" cellspacing="0">
 <tr>
  <th>
   Rank
  </th>
  <th>
   Lane
  </th>
  <th>
   Name
  </th>
  <th>
   Nationality
  </th>
  <th>
   Time
  </th>
  <th>
   Notes
  </th>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   4
  </td>
  <td>
   Michael Phelps
  </td>
  <td>
   United States
  </td>
  <td>
   51.25
  </td>
  <td>
   OR
  </td>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   3
  </td>
  <td>
   Ian Crocker
  </td>
  <td>
   United States
  </td>
  <td>
   51.29
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
  </td>
  <td>
   5
  </td>
  <td>
   Andriy Serdinov
  </td>
  <td>
   Ukraine
  </td>
  <td>
   51.36
  </td>
  <td>
   EU
  </td>
 </tr>
 <tr>
  <td>
   4
  </td>
  <td>
   1
  </td>
  <td>
   Thomas Rupprath
  </td>
  <td>
   Germany
  </td>
  <td>
   52.27
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   5
  </td>
  <td>
   6
  </td>
  <td>
   Igor Marchenko
  </td>
  <td>
   Russia
  </td>
  <td>
   52.32
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   6
  </td>
  <td>
   2
  </td>
  <td>
   Gabriel Mangabeira
  </td>
  <td>
   Brazil
  </td>
  <td>
   52.34
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   7
  </td>
  <td>
   8
  </td>
  <td>
   Duje Draganja
  </td>
  <td>
   Croatia
  </td>
  <td>
   52.46
  </td>
  <td>
  </td>
 </tr>
 <tr>
  <td>
   8
  </td>
  <td>
   7
  </td>
  <td>
   Geoff Huegill
  </td>
  <td>
   Australia
  </td>
  <td>
   52.56
  </td>
  <td>
  </td>
 </tr>
</table>

And the rendered HTML table:

Rank Lane Name Nationality Time Notes
4 Michael Phelps United States 51.25 OR
3 Ian Crocker United States 51.29
5 Andriy Serdinov Ukraine 51.36 EU
4 1 Thomas Rupprath Germany 52.27
5 6 Igor Marchenko Russia 52.32
6 2 Gabriel Mangabeira Brazil 52.34
7 8 Duje Draganja Croatia 52.46
8 7 Geoff Huegill Australia 52.56
Downloads last month
170
Safetensors
Model size
282M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for KennethTM/pix2struct-base-table2html

Finetuned
(2)
this model

Datasets used to train KennethTM/pix2struct-base-table2html

Space using KennethTM/pix2struct-base-table2html 1