Microsoft Table Transformer Table Structure Recognition trained on Pubtables and Fintabnet

If you do not have the deepdoctection Profile of the model, please add:

import deepdoctection as dd

dd.ModelCatalog.register("deepdoctection/tatr_tab_struct_v2/pytorch_model.bin", dd.ModelProfile(
    name="deepdoctection/tatr_tab_struct_v2/pytorch_model.bin",
    description="Table Transformer (DETR) model trained on PubTables1M. It was introduced in the paper "
                "Aligning benchmark datasets for table structure recognition by Smock et "
                "al. This model is devoted to table structure recognition and assumes to receive a slightly cropped"
                "table as input. It will predict rows, column and spanning cells. Use a padding of around 5 pixels",
    size=[115511753],
    tp_model=False,
    config="deepdoctection/tatr_tab_struct_v2/config.json",
    preprocessor_config="deepdoctection/tatr_tab_struct_v2/preprocessor_config.json",
    hf_repo_id="deepdoctection/tatr_tab_struct_v2",
    hf_model_name="pytorch_model.bin",
    hf_config_file=["config.json", "preprocessor_config.json"],
    categories={
        "1": dd.LayoutType.table,
        "2": dd.LayoutType.column,
        "3": dd.LayoutType.row,
        "4": dd.CellType.column_header,
        "5": dd.CellType.projected_row_header,
        "6": dd.CellType.spanning,
    },
    dl_library="PT",
    model_wrapper="HFDetrDerivedDetector",
))

When running the model within the deepdoctection analyzer, adjust the segmentation parameters in order to get better predictions.

    import deepdoctection as dd

    analyzer = dd.get_dd_analyzer(reset_config_file=True, config_overwrite=["PT.ITEM.WEIGHTS=deepdoctection/tatr_tab_struct_v2/pytorch_model.bin",
                                                                            "PT.ITEM.FILTER=['table']",
                                                                            "PT.ITEM.PAD.TOP=5",
                                                                            "PT.ITEM.PAD.RIGHT=5",
                                                                            "PT.ITEM.PAD.BOTTOM=5",
                                                                            "PT.ITEM.PAD.LEFT=5",
                                                                            "SEGMENTATION.THRESHOLD_ROWS=0.9",
                                                                            "SEGMENTATION.THRESHOLD_COLS=0.9",
                                                                            "SEGMENTATION.REMOVE_IOU_THRESHOLD_ROWS=0.3",
                                                                            "SEGMENTATION.REMOVE_IOU_THRESHOLD_COLS=0.3",
                                                                            "WORD_MATCHING.MAX_PARENT_ONLY=True"])
Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.