skyoneliu commited on
Commit
f666676
·
1 Parent(s): fec50f4

init the application files

Browse files

Signed-off-by: skyoneliu <[email protected]>

Files changed (7) hide show
  1. README.md +113 -8
  2. app.py +36 -0
  3. llm.py +55 -0
  4. prompts.py +86 -0
  5. requirements.txt +3 -0
  6. self_discover.py +62 -0
  7. task_example.py +42 -0
README.md CHANGED
@@ -1,14 +1,119 @@
1
  ---
2
- title: Self Discover
3
- emoji: 🏆
4
- colorFrom: pink
5
- colorTo: yellow
6
  sdk: streamlit
7
- sdk_version: 1.41.1
8
- app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Self-Discover
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: SELF-DISCOVER
3
+ emoji: 🔍
4
+ colorFrom: gray
5
+ colorTo: pink
6
  sdk: streamlit
 
 
7
  pinned: false
8
  license: apache-2.0
 
9
  ---
10
 
11
+ ## SELF-DISCOVER FRAMEWORK
12
+
13
+ [Try Hugging Face Demo from Here](https://huggingface.co/spaces/kailashsp/SELF-DISCOVER)
14
+
15
+ ## Paper Overview [link](https://arxiv.org/pdf/2402.03620.pdf)
16
+ This project implements the paper titled "Self-Discover: Large Language Models Self-Compose Reasoning Structures," submitted on February 6, 2024, by Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, and Huaixiu Steven Zheng. The paper introduces SELF-DISCOVER, a framework designed to enhance the performance of Large Language Models (LLMs) on complex reasoning tasks by enabling them to self-discover task-intrinsic reasoning structures.
17
+
18
+
19
+ ## Functionality (as given in paper)
20
+ - **Self-Discovery Process:** The system engages in a self-discovery process where it selects atomic reasoning modules and composes them into an explicit reasoning structure.
21
+ - **Performance Improvement:** SELF-DISCOVER significantly enhances the performance of LLMs on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, achieving up to a 32% improvement compared to conventional prompting methods like Chain of Thought (CoT).
22
+ - **Efficiency:** Despite its effectiveness, SELF-DISCOVER requires 10-40 times fewer inference computations compared to inference-intensive methods like CoT-Self-Consistency.
23
+ - **Universality:** The self-discovered reasoning structures are found to be universally applicable across different LLM model families, indicating commonalities with human reasoning patterns.
24
+
25
+
26
+
27
+ ## Project Overview
28
+
29
+ This project consists of a Python script (`self_discover.py`) along with associated modules and prompts. It allows users to input a specific task, and then it guides them through the process of selecting, adapting, and implementing reasoning modules to tackle that task effectively.
30
+
31
+ ## Implementation Details
32
+
33
+ - **Model Used:** The implementation Large Language Model (LLM) "gemini-pro" or "gpt-3.5-turbo"
34
+ - **Tasks:** The system is capable of handling various task to generate reasoning structure
35
+ - **Actions:** The system performs three main actions: SELECT, ADAPT, and IMPLEMENT.
36
+ - **SELECT:** This action involves selecting several reasoning modules crucial for solving the given task.
37
+ - **ADAPT:** The selected reasoning modules are rephrased and specified to better suit the task at hand.
38
+ - **IMPLEMENT:** The reasoning modules are operationalized into a step-by-step reasoning plan in JSON format, providing a structured approach for solving the task.
39
+
40
+
41
+ ## Prerequisites
42
+
43
+ - Python 3.10
44
+ - Libraries: google-generativeai, openai, dotenv
45
+ - Input the task you want to generate a reasoning structure in task_example.py
46
+
47
+ ## Installation
48
+
49
+ 1. Clone this repository:
50
+
51
+ ```bash
52
+ git clone https://github.com/kailashsp/SelfDiscover.git
53
+ ```
54
+
55
+ 2. Install the required libraries:
56
+
57
+ ```bash
58
+ pip install -r requirements.txt
59
+ ```
60
+ 3. create a .env file
61
+
62
+ 4. Open the `.env` file in a text editor.
63
+
64
+ 5. Add the following line to the `.env` file:
65
+
66
+ ```
67
+ GOOGLE_API_KEY=your_google_api_key_here
68
+ ```
69
+
70
+ Replace `your_google_api_key_here` with your actual Google API key obtained from [google makersuite](https://makersuite.google.com/app/apikey).
71
+ Your can also use OPENAI_API_KEY as well
72
+
73
+
74
+ ## Usage
75
+
76
+ 1. Initialize a `SelfDiscover` object with a task:
77
+
78
+ ```python
79
+ from self_disover import SelfDiscover
80
+ from task_example import task1
81
+
82
+ result = SelfDiscover(task=task1)
83
+ ```
84
+
85
+ 2. Call the `SelfDiscover` object:
86
+
87
+ ```python
88
+ result()
89
+ ```
90
+
91
+ 3. Access the selected and adapted modules also implemented reasoning structure:
92
+
93
+ ```python
94
+ print(f"SELECTED_MODULES : {result.selected_modules}")
95
+ print(f"ADAPTED_MODULES : {result.adapted_modules}")
96
+ print(f"REASONING_STRUCTURE : {result.reasoning_structure}")
97
+ ```
98
+
99
+ ## Customization
100
+
101
+ - Modify the `reasoning_modules` variable in `prompts.py` to add, remove, or modify reasoning modules.
102
+ - Adjust the prompts in `prompts.py` to customize the user interaction flow.
103
+
104
+ ## How to use the reasoning JSON structure
105
+
106
+ - As mentioned in the paper
107
+ ```markdown
108
+ For Stage 2, where we use the self-discovered structure to solve the task instances, we start with the prompt: “Follow the
109
+ step-by-step reasoning plan in JSON to correctly solve the task. Fill in the values following the keys by reasoning specifically
110
+ about the task given. Do not simply rephrase the keys.”, followed by the reasoning structure, and finally the task instance.
111
+ ```
112
+ You can now give the task with the reasoning structure with the above prompt
113
+
114
+ ## Contributing
115
+
116
+ Contributions are welcome! Feel free to open issues or pull requests with any improvements or suggestions.
117
+
118
+ ---
119
+
app.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import os
3
+ from self_discover import SelfDiscover
4
+
5
+
6
+ st.set_page_config(
7
+ page_title="SELF-DISCOVER",
8
+ page_icon="🔍",
9
+ layout="wide",
10
+ initial_sidebar_state="expanded"
11
+ )
12
+ st.title("SELF-DISCOVER")
13
+
14
+
15
+ api_key = st.text_input("Enter OpenAI api key ")
16
+ task = st.text_area("Enter the task example you want to generate a reasoning structure for ")
17
+
18
+ if st.button("Generate Reasoning Structure"):
19
+ os.environ["OPENAI_API_KEY"] = api_key
20
+ result = SelfDiscover(task)
21
+ result()
22
+ tab1, tab2, tab3 = st.tabs(["SELECTED_MODULES", "ADAPTED_MODULES", "REASONING_STRUCTURE"])
23
+ with tab1:
24
+ st.header("SELECTED_MODULES")
25
+ st.write(result.selected_modules)
26
+
27
+ with tab2:
28
+ st.header("ADAPTED_MODULES")
29
+ st.write(result.adapted_modules)
30
+
31
+ with tab3:
32
+ st.header("REASONING_STRUCTURE")
33
+ st.write(result.reasoning_structure)
34
+ else:
35
+ st.error("Please provide both your API key and a task example.")
36
+
llm.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import google.generativeai as genai
3
+ from openai import OpenAI
4
+ from dotenv import load_dotenv
5
+
6
+ load_dotenv()
7
+
8
+ generation_config = {
9
+ "temperature": 0,
10
+ "top_k": 1,
11
+ "max_output_tokens": 4000,
12
+ }
13
+
14
+
15
+ class LLM:
16
+ def __init__(self, model_name) -> None:
17
+ self.model_name = model_name
18
+ self.model = self.create_model(model_name)
19
+
20
+ def create_model(self, model_name):
21
+ match model_name:
22
+ case "gemini-pro-vision":
23
+ genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
24
+ return genai.GenerativeModel(model_name)
25
+ case "gemini-pro":
26
+ genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
27
+ return genai.GenerativeModel(
28
+ model_name,generation_config=generation_config)
29
+ case "OpenAI":
30
+ return OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
31
+ case _:
32
+ print("Not Implemented")
33
+
34
+ def __call__(self, prompt, image=None):
35
+ if self.model_name == 'gemini-pro-vision':
36
+ response = self.model.generate_content(
37
+ [image, prompt]
38
+ )
39
+ elif self.model_name == "gemini-pro":
40
+ response = self.model.generate_content(
41
+ prompt)
42
+ return response.text
43
+
44
+ elif self.model_name == 'OpenAI':
45
+ res = self.model.chat.completions.create(
46
+ model="gpt-3.5-turbo-1106",
47
+ # response_format={"type": "json_object"},
48
+ messages=[
49
+ # {"role": "system", "content": "You are a helpful assistant."},
50
+ {"role": "user", "content": f"{prompt}"},
51
+ ],
52
+ # seed=10,
53
+ temperature=0
54
+ )
55
+ return res.choices[0].message.content
prompts.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ reasoning_modules = """
2
+ 1 How could I devise an experiment to help solve that problem?
3
+ 2 Make a list of ideas for solving this problem, and apply them one by one to the problem to see if any progress can be made.
4
+ 3 How could I measure progress on this problem?
5
+ 4 How can I simplify the problem so that it is easier to solve?
6
+ 5 What are the key assumptions underlying this problem?
7
+ 6 What are the potential risks and drawbacks of each solution?
8
+ 7 What are the alternative perspectives or viewpoints on this problem?
9
+ 8 What are the long-term implications of this problem and its solutions?
10
+ 9 How can I break down this problem into smaller, more manageable parts?
11
+ 10 Critical Thinking: This style involves analyzing the problem from different perspectives, questioning assumptions, and evaluating
12
+ the evidence or information available. It focuses on logical reasoning, evidence-based decision-making, and identifying
13
+ potential biases or flaws in thinking.
14
+ 11 Try creative thinking, generate innovative and out-of-the-box ideas to solve the problem. Explore unconventional solutions,
15
+ thinking beyond traditional boundaries, and encouraging imagination and originality.
16
+ 12 Seek input and collaboration from others to solve the problem. Emphasize teamwork, open communication, and leveraging the
17
+ diverse perspectives and expertise of a group to come up with effective solutions.
18
+ 13 Use systems thinking: Consider the problem as part of a larger system and understanding the interconnectedness of various elements.
19
+ Focuses on identifying the underlying causes, feedback loops, and interdependencies that influence the problem, and developing holistic
20
+ solutions that address the system as a whole.
21
+ 14 Use Risk Analysis: Evaluate potential risks, uncertainties, and tradeoffs associated with different solutions or approaches to a
22
+ problem. Emphasize assessing the potential consequences and likelihood of success or failure, and making informed decisions based
23
+ on a balanced analysis of risks and benefits.
24
+ 15 Use Reflective Thinking: Step back from the problem, take the time for introspection and self-reflection. Examine personal biases,
25
+ assumptions, and mental models that may influence problem-solving, and being open to learning from past experiences to improve
26
+ future approaches.
27
+ 16 What is the core issue or problem that needs to be addressed?
28
+ 17 What are the underlying causes or factors contributing to the problem?
29
+ 18 Are there any potential solutions or strategies that have been tried before? If yes, what were the outcomes and lessons learned?
30
+ 19 What are the potential obstacles or challenges that might arise in solving this problem?
31
+ 20 Are there any relevant data or information that can provide insights into the problem? If yes, what data sources are available,
32
+ and how can they be analyzed?
33
+ 21 Are there any stakeholders or individuals who are directly affected by the problem? What are their perspectives and needs?
34
+ 22 What resources (financial, human, technological, etc.) are needed to tackle the problem effectively?
35
+ 23 How can progress or success in solving the problem be measured or evaluated?
36
+ 24 What indicators or metrics can be used?
37
+ 25 Is the problem a technical or practical one that requires a specific expertise or skill set? Or is it more of a conceptual or
38
+ theoretical problem?
39
+ 26 Does the problem involve a physical constraint, such as limited resources, infrastructure, or space?
40
+ 27 Is the problem related to human behavior, such as a social, cultural, or psychological issue?
41
+ 28 Does the problem involve decision-making or planning, where choices need to be made under uncertainty or with competing
42
+ objectives?
43
+ 29 Is the problem an analytical one that requires data analysis, modeling, or optimization techniques?
44
+ 30 Is the problem a design challenge that requires creative solutions and innovation?
45
+ 31 Does the problem require addressing systemic or structural issues rather than just individual instances?
46
+ 32 Is the problem time-sensitive or urgent, requiring immediate attention and action?
47
+ 33 What kinds of solution typically are produced for this kind of problem specification?
48
+ 34 Given the problem specification and the current best solution, have a guess about other possible solutions.
49
+ 35 Let's imagine the current best solution is totally wrong, what other ways are there to think about the problem specification?
50
+ 36 What is the best way to modify this current best solution, given what you know about these kinds of problem specification?
51
+ 37 Ignoring the current best solution, create an entirely new solution to the problem.
52
+ 38 Let's think step by step.
53
+ 39 Let's make a step by step plan and implement it with good notion and explanation"""
54
+
55
+
56
+ select_prompt = """
57
+ In order to solve the given task:
58
+ <Task>
59
+ {Task}
60
+ </Task>
61
+ Select several modules that are crucial for solving the tasks above
62
+ from all the reasoning module description given below:
63
+ {resonining_modules}
64
+ """
65
+
66
+ adapt_prompt = """
67
+ Rephrase and specify each reasoning module so that it better helps solving the task:
68
+ <Task>
69
+ {Task}
70
+ </Task>
71
+ SELECTED module descriptions:
72
+ {selected_modules}
73
+ Adapt each reasoning module description to better solve the task:
74
+ """
75
+
76
+ implement_prompt = """
77
+ Operationalize the reasoning modules into a step-by-step reasoning plan in JSON format
78
+ Example task:
79
+ <Task>
80
+ {Task}
81
+ </Task>
82
+ ADAPTED module descriptions:
83
+ {adapted_modules}
84
+
85
+ Implement a reasoning structure to generalise similar task to follow step-by-step and arrive at correct answers
86
+ """
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ openai==1.12.0
2
+ python-dotenv==1.0.1
3
+ google-generativeai==0.3.2
self_discover.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from prompts import(
2
+ select_prompt,
3
+ reasoning_modules,
4
+ adapt_prompt,
5
+ implement_prompt
6
+ )
7
+
8
+ from llm import LLM
9
+ from task_example import task1
10
+
11
+ import logging
12
+
13
+ def setup_logging():
14
+ logger = logging.getLogger("__name__")
15
+ logger.setLevel(logging.INFO)
16
+
17
+ handler = logging.FileHandler("prompt_log.txt")
18
+ handler.setLevel(logging.INFO)
19
+
20
+ formatter = logging.Formatter('%(levelname)s - %(message)s')
21
+ handler.setFormatter(formatter)
22
+
23
+ logger.addHandler(handler)
24
+ return logger
25
+
26
+ logger = setup_logging()
27
+
28
+ class SelfDiscover:
29
+ def __init__(self, task) -> None:
30
+ self.llm = LLM(model_name="OpenAI")
31
+ self.actions = ["SELECT", "ADAPT", "IMPLEMENT"]
32
+ self.task = task
33
+
34
+ def __call__(self):
35
+ for action in self.actions:
36
+ print(action)
37
+ if action == "SELECT":
38
+ prompt = select_prompt.replace("{Task}",self.task)
39
+ prompt = prompt.replace("{resonining_modules}", reasoning_modules)
40
+ logger.info("SELECT PROMPT :" + prompt)
41
+ self.selected_modules = self.llm(prompt)
42
+ print(self.selected_modules)
43
+
44
+ elif action == "ADAPT":
45
+ prompt = adapt_prompt.replace("{Task}",self.task)
46
+ prompt = prompt.replace("{selected_modules}",self.selected_modules)
47
+ logger.info("ADAPT PROMPT :" + prompt)
48
+ self.adapted_modules = self.llm(prompt)
49
+
50
+ elif action == "IMPLEMENT":
51
+ prompt = implement_prompt.replace("{Task}",self.task)
52
+ prompt = prompt.replace("{adapted_modules}", self.adapted_modules)
53
+ logger.info("IMPLEMENT PROMPT:" + prompt)
54
+ self.reasoning_structure = self.llm(prompt)
55
+
56
+
57
+ if __name__=="__main__":
58
+ result = SelfDiscover(task=task1)
59
+ result()
60
+ logger.info(f"SELECTED_MODULES : {result.selected_modules}")
61
+ logger.info(f"ADAPTED_MODULES : {result.adapted_modules}")
62
+ logger.info(f"REASONING_STRUCTURE : {result.reasoning_structure}")
task_example.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ task1 =""""
2
+ You will be provided with unstructured data and your task is to accurately extract property answers to a set of questions
3
+ from within the context.
4
+ The context is from a report which provides with details of subject property as key value pairs
5
+
6
+ <context>
7
+
8
+ Distance from Subject: 0.20
9
+ Comp 2
10
+ Address: Kite street,207, CA
11
+ Owner: JISHNU S
12
+ Sale Price: $455,000
13
+ Living Area: 2100
14
+ APN: 04-05664-05660-05505
15
+ Lot Area: 5000
16
+ Total Rooms: 4
17
+ Bedrooms: 2
18
+ Sale Date: 14/10/2021
19
+ Year Built: 1823
20
+
21
+ </context>
22
+
23
+ <questions>
24
+
25
+ 0. what is the address of subject property
26
+ 1. which city is the subject property located
27
+ 2. which county is the subject property located
28
+ 3. what is the owner name of the subject property
29
+ 4. what id the loan number of the subject property
30
+ 5. wht is the Total assessed value of subject property
31
+ 6. what is the assessed value of improvements to the subject property
32
+ 7. what is the assessed value of Land of the subject property
33
+ 8. what is the estimated value of subject property
34
+ 9. what is the date of the estimated value of the subject property
35
+ 10. what is the estimated value range of the subject property
36
+ 11. what is the processed date of the subject property
37
+ 12. What is the confidence score of the subject property
38
+ 13. what is the forecast standard deviation of the subject property
39
+ 14. which state is the property located in of the subject property
40
+
41
+ </questions>
42
+ """