๐๏ธ "We need digital sobriety." @sasha challenges Big Tech's race for nuclear energy on BBC AI Decoded. Instead of pursuing more power, shouldn't we first ask if we really need AI everywhere?
**Exploring Realistic Emotional Depth in AI Language Models**
Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.
To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:
- **Suicide** - **Depression** - **Anxiety**
Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.
The "Dark Sentience" dataset is now available for review and use at: Locutusque/Dark-Sentience. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.
Websites slam doors on AI data harvesting ๐ช๐
New study "Consent in Crisis: The Rapid Decline of the AI Data Commons" reveals a rapid decline in open web access.
Key findings from 14,000 web domains audit: - +5% of three common data sets (C4, RefinedWeb and Dolma) now fully restricted, +25% of the highest-quality sources now fully restricted - 45% of C4 restricted by Terms of Service
Noteworthy trends: ๐ซ๐ OpenAI banned 2x more than any other company ๐ฐ๐ News sites leading restrictions: 45% of tokens off-limits
Two quotes in the NYT piece to ponder:
โUnsurprisingly, weโre seeing blowback from data creators after the text, images and videos theyโve shared online are used to develop commercial systems that sometimes directly threaten their livelihoods.โ โ @yjernite
โMajor tech companies already have all of the data. Changing the license on the data doesnโt retroactively revoke that permission, and the primary impact is on later-arriving actors, who are typically either smaller start-ups or researchers.โ โ @stellaathena
When building applications with LLMs, writing effective prompts is a long process of trial and error. ๐ Often, if you switch models, you also have to change the prompt. ๐ฉ What if you could automate this process?
๐ก That's where DSPy comes in - a framework designed to algorithmically optimize prompts for Language Models. By applying classical machine learning concepts (training and evaluation data, metrics, optimization), DSPy generates better prompts for a given model and task.
Recently, I explored combining DSPy with the robustness of Haystack Pipelines.
Here's how it works: โถ๏ธ Start from a Haystack RAG pipeline with a basic prompt ๐ฏ Define a goal (in this case, get correct and concise answers) ๐ Create a DSPy program, define data and metrics โจ Optimize and evaluate -> improved prompt ๐ Build a refined Haystack RAG pipeline using the optimized prompt