Spaces:
Running
Running
cheesyFishes
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -10,49 +10,154 @@ pinned: false
|
|
10 |
|
11 |
# 🗂️ LlamaIndex 🦙
|
12 |
|
13 |
-
|
|
|
|
|
|
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
- GPT Index (duplicate): https://pypi.org/project/gpt-index/.
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
|
|
|
|
|
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
### Ecosystem
|
26 |
|
27 |
-
- LlamaHub (community library of data loaders)
|
28 |
-
- LlamaLab (cutting-edge AGI projects using LlamaIndex)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
## 💻 Example Usage
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
```
|
|
|
|
|
|
|
|
|
34 |
pip install llama-index
|
35 |
```
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
-
To build a simple vector store index:
|
40 |
```python
|
41 |
import os
|
42 |
-
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
|
43 |
|
44 |
-
|
45 |
-
|
46 |
-
|
|
|
|
|
|
|
47 |
```
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
To query:
|
|
|
51 |
```python
|
52 |
query_engine = index.as_query_engine()
|
53 |
-
query_engine.query("
|
54 |
```
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
By default, data is stored in-memory.
|
58 |
To persist to disk (under `./storage`):
|
@@ -62,12 +167,12 @@ index.storage_context.persist()
|
|
62 |
```
|
63 |
|
64 |
To reload from disk:
|
|
|
65 |
```python
|
66 |
-
from llama_index import StorageContext, load_index_from_storage
|
67 |
|
68 |
# rebuild storage context
|
69 |
-
storage_context = StorageContext.from_defaults(persist_dir=
|
70 |
# load index
|
71 |
index = load_index_from_storage(storage_context)
|
72 |
```
|
73 |
-
|
|
|
10 |
|
11 |
# 🗂️ LlamaIndex 🦙
|
12 |
|
13 |
+
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-index)](https://pypi.org/project/llama-index/)
|
14 |
+
[![GitHub contributors](https://img.shields.io/github/contributors/jerryjliu/llama_index)](https://github.com/jerryjliu/llama_index/graphs/contributors)
|
15 |
+
[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
|
16 |
+
[![Ask AI](https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg?&logo=)](https://www.phorm.ai/query?projectId=c5863b56-6703-4a5d-87b6-7e6031bf16b6)
|
17 |
|
18 |
+
LlamaIndex (GPT Index) is a data framework for your LLM application. Building with LlamaIndex typically involves working with LlamaIndex core and a chosen set of integrations (or plugins). There are two ways to start building with LlamaIndex in
|
19 |
+
Python:
|
|
|
20 |
|
21 |
+
1. **Starter**: [`pip install llama-index`](https://pypi.org/project/llama-index/). A starter Python package that includes core LlamaIndex as well as a selection of integrations.
|
22 |
|
23 |
+
2. **Customized**: [`pip install llama-index-core`](https://pypi.org/project/llama-index-core/). Install core LlamaIndex and add your chosen LlamaIndex integration packages on [LlamaHub](https://llamahub.ai/)
|
24 |
+
that are required for your application. There are over 300 LlamaIndex integration
|
25 |
+
packages that work seamlessly with core, allowing you to build with your preferred
|
26 |
+
LLM, embedding, and vector store providers.
|
27 |
|
28 |
+
### Important Links
|
29 |
+
|
30 |
+
LlamaIndex.TS [(Typescript/Javascript)](https://github.com/run-llama/LlamaIndexTS)
|
31 |
+
|
32 |
+
[Documentation](https://docs.llamaindex.ai/en/stable/)
|
33 |
+
|
34 |
+
[Twitter](https://twitter.com/llama_index)
|
35 |
+
|
36 |
+
[Discord](https://discord.gg/dGcwcsnxhU)
|
37 |
|
38 |
### Ecosystem
|
39 |
|
40 |
+
- LlamaHub [(community library of data loaders)](https://llamahub.ai)
|
41 |
+
- LlamaLab [(cutting-edge AGI projects using LlamaIndex)](https://github.com/run-llama/llama-lab)
|
42 |
+
|
43 |
+
## 🚀 Overview
|
44 |
+
|
45 |
+
**NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!
|
46 |
+
|
47 |
+
### Context
|
48 |
+
|
49 |
+
- LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
|
50 |
+
- How do we best augment LLMs with our own private data?
|
51 |
+
|
52 |
+
We need a comprehensive toolkit to help perform this data augmentation for LLMs.
|
53 |
|
54 |
+
### Proposed Solution
|
55 |
+
|
56 |
+
That's where **LlamaIndex** comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:
|
57 |
+
|
58 |
+
- Offers **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
|
59 |
+
- Provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.
|
60 |
+
- Provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
|
61 |
+
- Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, or anything else).
|
62 |
+
|
63 |
+
LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in
|
64 |
+
5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules),
|
65 |
+
to fit their needs.
|
66 |
+
|
67 |
+
## 📄 Documentation
|
68 |
+
|
69 |
+
Full documentation can be found [here](https://docs.llamaindex.ai/en/latest/)
|
70 |
+
|
71 |
+
Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!
|
72 |
|
73 |
## 💻 Example Usage
|
74 |
|
75 |
+
The LlamaIndex Python library is namespaced such that import statements which
|
76 |
+
include `core` imply that the core package is being used. In contrast, those
|
77 |
+
statements without `core` imply that an integration package is being used.
|
78 |
+
|
79 |
+
```python
|
80 |
+
# typical pattern
|
81 |
+
from llama_index.core.xxx import ClassABC # core submodule xxx
|
82 |
+
from llama_index.xxx.yyy import (
|
83 |
+
SubclassABC,
|
84 |
+
) # integration yyy for submodule xxx
|
85 |
+
|
86 |
+
# concrete example
|
87 |
+
from llama_index.core.llms import LLM
|
88 |
+
from llama_index.llms.openai import OpenAI
|
89 |
```
|
90 |
+
|
91 |
+
To get started, we can install llama-index directly using the starter dependencies (mainly OpenAI):
|
92 |
+
|
93 |
+
```sh
|
94 |
pip install llama-index
|
95 |
```
|
96 |
|
97 |
+
Or we can do a more custom isntallation:
|
98 |
+
|
99 |
+
```sh
|
100 |
+
# custom selection of integrations to work with core
|
101 |
+
pip install llama-index-core
|
102 |
+
pip install llama-index-llms-openai
|
103 |
+
pip install llama-index-llms-ollama
|
104 |
+
pip install llama-index-embeddings-huggingface
|
105 |
+
pip install llama-index-readers-file
|
106 |
+
```
|
107 |
+
|
108 |
+
To build a simple vector store index using OpenAI:
|
109 |
|
|
|
110 |
```python
|
111 |
import os
|
|
|
112 |
|
113 |
+
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
|
114 |
+
|
115 |
+
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
|
116 |
+
|
117 |
+
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
|
118 |
+
index = VectorStoreIndex.from_documents(documents)
|
119 |
```
|
120 |
|
121 |
+
To build a simple vector store index using non-OpenAI models, we can leverage Ollama and HuggingFace. This assumes you've already installed Ollama and have pulled the model you want to use.
|
122 |
+
|
123 |
+
```python
|
124 |
+
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
|
125 |
+
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
|
126 |
+
from llama_index.llms.ollama import Ollama
|
127 |
+
|
128 |
+
# set the LLM
|
129 |
+
llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"
|
130 |
+
Settings.llm = Ollama(
|
131 |
+
model="llama3.1:latest",
|
132 |
+
temperature=0.1,
|
133 |
+
request_timeout=360.0,
|
134 |
+
)
|
135 |
+
|
136 |
+
# set the embed model
|
137 |
+
Settings.embed_model = HuggingFaceEmbedding(
|
138 |
+
model_name="BAAI/bge-small-en-v1.5",
|
139 |
+
embed_batch_size=2,
|
140 |
+
)
|
141 |
+
|
142 |
+
documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
|
143 |
+
index = VectorStoreIndex.from_documents(
|
144 |
+
documents,
|
145 |
+
)
|
146 |
+
```
|
147 |
|
148 |
To query:
|
149 |
+
|
150 |
```python
|
151 |
query_engine = index.as_query_engine()
|
152 |
+
query_engine.query("YOUR_QUESTION")
|
153 |
```
|
154 |
|
155 |
+
Or chat:
|
156 |
+
|
157 |
+
```python
|
158 |
+
chat_engine = index.as_chat_engine(chat_mode="condense_plus_context")
|
159 |
+
chat_engine.chat("YOUR MESSAGE")
|
160 |
+
```
|
161 |
|
162 |
By default, data is stored in-memory.
|
163 |
To persist to disk (under `./storage`):
|
|
|
167 |
```
|
168 |
|
169 |
To reload from disk:
|
170 |
+
|
171 |
```python
|
172 |
+
from llama_index.core import StorageContext, load_index_from_storage
|
173 |
|
174 |
# rebuild storage context
|
175 |
+
storage_context = StorageContext.from_defaults(persist_dir="./storage")
|
176 |
# load index
|
177 |
index = load_index_from_storage(storage_context)
|
178 |
```
|
|