Hands on Tutorial Gen AI for Scientific Computing - SciPy 2024, Tacoma, WA, Jul 9th

#1
by atambay37 - opened
Scientific Software Engineering Center @ University of Washington org
edited Jun 11, 2024

Announcing "Generative AI Copilot for Scientific Software – a RAG-Based Approach" at SciPy 2024, Tacoma, WA, July 9th, 2024, 1:30 PM. https://cfp.scipy.org/2024/talk/BZGQMC/

The Scientific Software Engineering Center (SSEC) at University of Washington's eScience Institute will be putting on a hands on tutorial "Generative AI Copilot for Scientific Software – a RAG-Based Approach" at SciPy 2024: teaching attendees how to leverage open language models for scientific exploration with diverse input data, both public and private. Please find below a description of the workshop.

Generative AI systems built upon large language models (LLMs) have shown great promise as tools that enable people to access information through natural conversation. Scientists can benefit from the breakthroughs these systems enable to create advanced tools that will help accelerate their research outcomes. This tutorial will cover: (1) the basics of language models, (2) setting up the environment for using open source LLMs without the use of expensive compute resources needed for training or fine-tuning, (3) learning a technique like Retrieval-Augmented Generation (RAG) to optimize output of LLM, and (4) build a “production-ready” app to demonstrate how researchers could turn disparate knowledge bases into special purpose AI-powered tools. The right audience for our tutorial is scientists and research engineers who want to use LLMs for their work.

The language model used in the tutorial is the Allen Institute for AI (AI2) Open Language Model (OLMo), an LLM with open data, code, weights, and evaluation benchmarks. OLMo is purpose-built for scientific discovery as it has been trained on Dolma, an open dataset of 3 trillion tokens collected from diverse web content, academic publications, code, books, and encyclopedic materials. LangChain is a Python and JavaScript framework for developing applications powered by LLMs. Using LangChain, we’ll create a context-aware question answering agent by implementing a RAG chain. Using a simple example from the astronomy community, we demonstrate how the tool performs correctly with and incorrectly without RAG-enabled context. At the end of the tutorial, attendees will create an AI-powered question and answering application that they can use to advance their research.

To improve the user experience for those interested in remotely chatting with our specified OLMo model, we request Nvidia T4 - medium or 1x Nvidia L4 GPUs and persistent storage. Speeding up the chat interface will encourage greater community adoption and allow for people to try more use cases and better engage the model.

Vani1 changed discussion title from Apply for community grant: Academic project (gpu and storage) to Hands on Tutorial Gen AI for Scientific Computing - SciPy 2024, Tacoma, WA, Jul 9th

Sign up or log in to comment