GEM benchmark

https://gem-benchmark.com

Activity Feed Request to join this org

AI & ML interests

We develop infrastructure for the evaluation of generated text.

Recent Activity

Krystalan authored a paper 4 days ago

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

fladhak authored a paper 15 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

gentaiscool authored a paper 19 days ago

Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision

View all activity

GEM's activity

prithivMLmods

posted an update 3 days ago

Post

2773

Triangulum Catalogued 🔥💫

🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF

1 reply

Krystalan

authored a paper 4 days ago

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Paper • 2412.17498 • Published 12 days ago • 21

lewtun

posted an update 5 days ago

Post

1864

This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!

1 reply

prithivMLmods

posted an update 13 days ago

Post

6158

Sketchify 😉🎨

+ strangerzonehf/Flux-Sketch-Smudge-LoRA
+ strangerzonehf/Flux-Sketch-Sized-LoRA
+ strangerzonehf/Sketch-Paint

- strangerzonehf/sketch-fav-675ba869c7ceaec7e652ee1c

fladhak

authored a paper 15 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 17 days ago • 116

prithivMLmods

posted an update 16 days ago

Post

2464

Qwen2VL Models: Vision and Language Processing 🍉

📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]

Colab Demo: prithivMLmods/Qwen2-VL-OCR-2B-Instruct

❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.

🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.

📄PDFs are rendered using the ReportLab software library toolkit.

🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct

🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing

📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f

.
.
.
@prithivMLmods 🤗

1 reply

prithivMLmods

posted an update 17 days ago

Post

3215

🎄 Here Before - Xmas🎅✨

🧑🏻‍🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA

❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e

🥶Page
+ [ Stranger Zone ] : https://huggingface.co/strangerzonehf

.
.
.
@prithivMLmods 🤗

lewtun

posted an update 18 days ago

Post

6631

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

gentaiscool

authored 12 papers 19 days ago

Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision

Paper • 1805.12307 • Published May 31, 2018

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

Paper • 2309.10661 • Published Sep 19, 2023 • 1

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

Paper • 2009.05387 • Published Sep 11, 2020

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

Paper • 2311.12405 • Published Nov 21, 2023

Are Multilingual Models Effective in Code-Switching?

Paper • 2103.13309 • Published Mar 24, 2021

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

Paper • 2104.08200 • Published Apr 16, 2021

LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization

Paper • 2401.06034 • Published Jan 11, 2024

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

Paper • 2109.06762 • Published Sep 14, 2021 • 1

Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

Paper • 2110.08118 • Published Oct 15, 2021

AI & ML interests

Recent Activity

Team members 83

GEM's activity