Question Regarding the trainingsset

#1
by verbrannter - opened

Hello,

You said you trained with your own proprietary dataset, but can you talk about how your dataset was structured?
I want to train something similar to you, maybe with more fields, but I'm not quite sure how to structure my training dataset.

Kind regards
lars

to-be changed discussion status to closed

Hello @to-be .Thank you for the demo. Since the dataset is confidential, I have 2 questions regarding the training that you've done please :

  • What hyperparameters did you choose for your training (I have a similar dataset -- as I've seen int your 3 test invoices --) ?
  • How were you able to use a different input size (I get an error indicating a mismatch when I change the default input size used in Donut) ?

Thank you in advance.

  1. train_batch_sizes:
    • 1
      val_batch_sizes:
    • 2
      input_size:
    • 1600
    • 1280
      max_length: 256
      align_long_axis: False
      num_nodes: 1
      seed: 2022
      lr: 3e-05
      warmup_steps: 300
      num_training_samples_per_epoch: 1200
      max_epochs: 100
      max_steps: -1
      num_workers: 4
      val_check_interval: 1.0
      check_val_every_n_epoch: 3
      gradient_clip_val: 1.0
  2. from transformers import VisionEncoderDecoderConfig

max_length = 768
image_size = [1920, 1280]
#image_size = [1280, 960]

update image_size of the encoder

during pre-training, a larger image size was used

config = VisionEncoderDecoderConfig.from_pretrained("naver-clova-ix/donut-base")
config.encoder.image_size = image_size # (height, width)

update max_length of the decoder (for generation)

config.decoder.max_length = max_length

Sign up or log in to comment