cheesyFishes commited on
Commit
74f745b
·
verified ·
1 Parent(s): d58dec6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -19
README.md CHANGED
@@ -10,49 +10,154 @@ pinned: false
10
 
11
  # 🗂️ LlamaIndex 🦙
12
 
13
- LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.
 
 
 
14
 
15
- PyPI:
16
- - LlamaIndex: https://pypi.org/project/llama-index/.
17
- - GPT Index (duplicate): https://pypi.org/project/gpt-index/.
18
 
19
- Documentation: https://gpt-index.readthedocs.io/.
20
 
21
- Twitter: https://twitter.com/llama_index.
 
 
 
22
 
23
- Discord: https://discord.gg/dGcwcsnxhU.
 
 
 
 
 
 
 
 
24
 
25
  ### Ecosystem
26
 
27
- - LlamaHub (community library of data loaders): https://llamahub.ai
28
- - LlamaLab (cutting-edge AGI projects using LlamaIndex): https://github.com/run-llama/llama-lab
 
 
 
 
 
 
 
 
 
 
 
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## 💻 Example Usage
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ```
 
 
 
 
34
  pip install llama-index
35
  ```
36
 
37
- Examples are in the `examples` folder. Indices are in the `indices` folder (see list of indices below).
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- To build a simple vector store index:
40
  ```python
41
  import os
42
- os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
43
 
44
- from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
45
- documents = SimpleDirectoryReader('data').load_data()
46
- index = GPTVectorStoreIndex.from_documents(documents)
 
 
 
47
  ```
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  To query:
 
51
  ```python
52
  query_engine = index.as_query_engine()
53
- query_engine.query("<question_text>?")
54
  ```
55
 
 
 
 
 
 
 
56
 
57
  By default, data is stored in-memory.
58
  To persist to disk (under `./storage`):
@@ -62,12 +167,12 @@ index.storage_context.persist()
62
  ```
63
 
64
  To reload from disk:
 
65
  ```python
66
- from llama_index import StorageContext, load_index_from_storage
67
 
68
  # rebuild storage context
69
- storage_context = StorageContext.from_defaults(persist_dir='./storage')
70
  # load index
71
  index = load_index_from_storage(storage_context)
72
  ```
73
-
 
10
 
11
  # 🗂️ LlamaIndex 🦙
12
 
13
+ [![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-index)](https://pypi.org/project/llama-index/)
14
+ [![GitHub contributors](https://img.shields.io/github/contributors/jerryjliu/llama_index)](https://github.com/jerryjliu/llama_index/graphs/contributors)
15
+ [![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
16
+ [![Ask AI](https://img.shields.io/badge/Phorm-Ask_AI-%23F2777A.svg?&logo=)](https://www.phorm.ai/query?projectId=c5863b56-6703-4a5d-87b6-7e6031bf16b6)
17
 
18
+ LlamaIndex (GPT Index) is a data framework for your LLM application. Building with LlamaIndex typically involves working with LlamaIndex core and a chosen set of integrations (or plugins). There are two ways to start building with LlamaIndex in
19
+ Python:
 
20
 
21
+ 1. **Starter**: [`pip install llama-index`](https://pypi.org/project/llama-index/). A starter Python package that includes core LlamaIndex as well as a selection of integrations.
22
 
23
+ 2. **Customized**: [`pip install llama-index-core`](https://pypi.org/project/llama-index-core/). Install core LlamaIndex and add your chosen LlamaIndex integration packages on [LlamaHub](https://llamahub.ai/)
24
+ that are required for your application. There are over 300 LlamaIndex integration
25
+ packages that work seamlessly with core, allowing you to build with your preferred
26
+ LLM, embedding, and vector store providers.
27
 
28
+ ### Important Links
29
+
30
+ LlamaIndex.TS [(Typescript/Javascript)](https://github.com/run-llama/LlamaIndexTS)
31
+
32
+ [Documentation](https://docs.llamaindex.ai/en/stable/)
33
+
34
+ [Twitter](https://twitter.com/llama_index)
35
+
36
+ [Discord](https://discord.gg/dGcwcsnxhU)
37
 
38
  ### Ecosystem
39
 
40
+ - LlamaHub [(community library of data loaders)](https://llamahub.ai)
41
+ - LlamaLab [(cutting-edge AGI projects using LlamaIndex)](https://github.com/run-llama/llama-lab)
42
+
43
+ ## 🚀 Overview
44
+
45
+ **NOTE**: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!
46
+
47
+ ### Context
48
+
49
+ - LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
50
+ - How do we best augment LLMs with our own private data?
51
+
52
+ We need a comprehensive toolkit to help perform this data augmentation for LLMs.
53
 
54
+ ### Proposed Solution
55
+
56
+ That's where **LlamaIndex** comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:
57
+
58
+ - Offers **data connectors** to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
59
+ - Provides ways to **structure your data** (indices, graphs) so that this data can be easily used with LLMs.
60
+ - Provides an **advanced retrieval/query interface over your data**: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
61
+ - Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, or anything else).
62
+
63
+ LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in
64
+ 5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules),
65
+ to fit their needs.
66
+
67
+ ## 📄 Documentation
68
+
69
+ Full documentation can be found [here](https://docs.llamaindex.ai/en/latest/)
70
+
71
+ Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!
72
 
73
  ## 💻 Example Usage
74
 
75
+ The LlamaIndex Python library is namespaced such that import statements which
76
+ include `core` imply that the core package is being used. In contrast, those
77
+ statements without `core` imply that an integration package is being used.
78
+
79
+ ```python
80
+ # typical pattern
81
+ from llama_index.core.xxx import ClassABC # core submodule xxx
82
+ from llama_index.xxx.yyy import (
83
+ SubclassABC,
84
+ ) # integration yyy for submodule xxx
85
+
86
+ # concrete example
87
+ from llama_index.core.llms import LLM
88
+ from llama_index.llms.openai import OpenAI
89
  ```
90
+
91
+ To get started, we can install llama-index directly using the starter dependencies (mainly OpenAI):
92
+
93
+ ```sh
94
  pip install llama-index
95
  ```
96
 
97
+ Or we can do a more custom isntallation:
98
+
99
+ ```sh
100
+ # custom selection of integrations to work with core
101
+ pip install llama-index-core
102
+ pip install llama-index-llms-openai
103
+ pip install llama-index-llms-ollama
104
+ pip install llama-index-embeddings-huggingface
105
+ pip install llama-index-readers-file
106
+ ```
107
+
108
+ To build a simple vector store index using OpenAI:
109
 
 
110
  ```python
111
  import os
 
112
 
113
+ os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
114
+
115
+ from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
116
+
117
+ documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
118
+ index = VectorStoreIndex.from_documents(documents)
119
  ```
120
 
121
+ To build a simple vector store index using non-OpenAI models, we can leverage Ollama and HuggingFace. This assumes you've already installed Ollama and have pulled the model you want to use.
122
+
123
+ ```python
124
+ from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
125
+ from llama_index.embeddings.huggingface import HuggingFaceEmbedding
126
+ from llama_index.llms.ollama import Ollama
127
+
128
+ # set the LLM
129
+ llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"
130
+ Settings.llm = Ollama(
131
+ model="llama3.1:latest",
132
+ temperature=0.1,
133
+ request_timeout=360.0,
134
+ )
135
+
136
+ # set the embed model
137
+ Settings.embed_model = HuggingFaceEmbedding(
138
+ model_name="BAAI/bge-small-en-v1.5",
139
+ embed_batch_size=2,
140
+ )
141
+
142
+ documents = SimpleDirectoryReader("YOUR_DATA_DIRECTORY").load_data()
143
+ index = VectorStoreIndex.from_documents(
144
+ documents,
145
+ )
146
+ ```
147
 
148
  To query:
149
+
150
  ```python
151
  query_engine = index.as_query_engine()
152
+ query_engine.query("YOUR_QUESTION")
153
  ```
154
 
155
+ Or chat:
156
+
157
+ ```python
158
+ chat_engine = index.as_chat_engine(chat_mode="condense_plus_context")
159
+ chat_engine.chat("YOUR MESSAGE")
160
+ ```
161
 
162
  By default, data is stored in-memory.
163
  To persist to disk (under `./storage`):
 
167
  ```
168
 
169
  To reload from disk:
170
+
171
  ```python
172
+ from llama_index.core import StorageContext, load_index_from_storage
173
 
174
  # rebuild storage context
175
+ storage_context = StorageContext.from_defaults(persist_dir="./storage")
176
  # load index
177
  index = load_index_from_storage(storage_context)
178
  ```