Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ It supports multiple languages (29 in total) and is specialized for tasks involv
|
|
37 |
|
38 |
- **Better Markdown Generation**: Generates cleaner, more readable Markdown output.
|
39 |
- **JSON Output**: Can produce JSON-formatted text, enabling structured extraction for further downstream processing.
|
40 |
-
- **Longer Context Handling**: Can handle up to 512K tokens, which is beneficial for large HTML documents
|
41 |
- **Multilingual Support**: Covers 29 languages for broader application across international web data.
|
42 |
|
43 |
---
|
@@ -50,10 +50,12 @@ For a more hands-on experience in a hosted environment, see the [Google Colab No
|
|
50 |
## On Google Colab
|
51 |
|
52 |
The easiest way to experience `ReaderLM-v2` is by running our [Colab notebook](https://colab.research.google.com/drive/1FfPjZwkMSocOLsEYH45B3B4NxDryKLGI?usp=sharing),
|
53 |
-
The notebook
|
|
|
|
|
|
|
|
|
54 |
|
55 |
-
• For simple HTML-to-Markdown tasks, you only need to provide the raw HTML (no special instructions).
|
56 |
-
• For JSON output and instruction-based extraction, use the prompt formatting guidelines in the notebook.
|
57 |
|
58 |
## Local Usage
|
59 |
|
|
|
37 |
|
38 |
- **Better Markdown Generation**: Generates cleaner, more readable Markdown output.
|
39 |
- **JSON Output**: Can produce JSON-formatted text, enabling structured extraction for further downstream processing.
|
40 |
+
- **Longer Context Handling**: Can handle up to 512K tokens, which is beneficial for large HTML documents.
|
41 |
- **Multilingual Support**: Covers 29 languages for broader application across international web data.
|
42 |
|
43 |
---
|
|
|
50 |
## On Google Colab
|
51 |
|
52 |
The easiest way to experience `ReaderLM-v2` is by running our [Colab notebook](https://colab.research.google.com/drive/1FfPjZwkMSocOLsEYH45B3B4NxDryKLGI?usp=sharing),
|
53 |
+
The notebook demonstrates HTML-to-markdown conversion, JSON extraction, and instruction-following using the HackerNews frontpage as an example.
|
54 |
+
The notebook is optimized for Colab's free T4 GPU tier and requires `vllm` and `triton` for acceleration and running.
|
55 |
+
Feel free to test it with any website.
|
56 |
+
For HTML-to-markdown tasks, simply input the raw HTML without any prefix instructions.
|
57 |
+
However, JSON output and instruction-based extraction require specific prompt formatting as shown in the examples.
|
58 |
|
|
|
|
|
59 |
|
60 |
## Local Usage
|
61 |
|