What advantage does this have over normal algorithmic ways of turning HTML to Markdown ?
I don't understand why would i use this instead of going directly to a simple tool that will convert my HTML to Markdown. What advantages will i see here ?
I hope this post will answer your question https://jina.ai/news/readerlm-v2-frontier-small-language-model-for-html-to-markdown-and-json
TL;DR: the structure of HTML is reserved well, and excelling at generating complex elements like code fences, nested lists, tables and LaTex equations.
I think it's a great model to use in the future. I understand that for now the algorithmic way of extracting html wins but I think they are demonstrating the capabilities of what an LLMs could do without the algorithm.
I liked the model, do you plan to extract the dataset from html to markdown and json?
Thank you very much.