Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -9,6 +9,7 @@ import sys
|
|
9 |
from audio_processing import AudioProcessor
|
10 |
import spaces
|
11 |
from chunkedTranscriber import ChunkedTranscriber
|
|
|
12 |
|
13 |
|
14 |
logging.basicConfig(
|
@@ -18,6 +19,7 @@ logging.basicConfig(
|
|
18 |
)
|
19 |
logger = logging.getLogger(__name__)
|
20 |
|
|
|
21 |
def load_qa_model():
|
22 |
"""Load question-answering model with long context support."""
|
23 |
try:
|
@@ -153,279 +155,7 @@ def answer_question(context, question):
|
|
153 |
|
154 |
messages = [
|
155 |
# {"role": "system", "content": "You are a helpful assistant who can answer questions based on the given context."},
|
156 |
-
{"role":"system", "content":
|
157 |
-
Analyze a translated transcript of a conversation that may contain multiple speakers and summarize the information in a structured intelligence document.
|
158 |
-
|
159 |
-
The input format will include word-level or sentence-level timestamps, each indicating the speaker ID, language, and translated text.
|
160 |
-
|
161 |
-
# Input Format Overview
|
162 |
-
|
163 |
-
Word-Level Timestamps Example:
|
164 |
-
```
|
165 |
-
[Start Time - End Time] - Speaker <ID> - Language: <Translated Language> - Translated Text: "<Word>"
|
166 |
-
```
|
167 |
-
Example:
|
168 |
-
```
|
169 |
-
0.01-0.02 - Speaker 1 - Language: English - Translated Text: "Proceed"
|
170 |
-
0.02-0.025 - Speaker 1 - Language: English - Translated Text: "with"
|
171 |
-
0.025-0.032 - Speaker 2 - Language: English - Translated Text: "caution"
|
172 |
-
```
|
173 |
-
|
174 |
-
Optional Sentence-Level Structure Example:
|
175 |
-
```
|
176 |
-
[Start Time - End Time] - Speaker <ID> - Language: <Translated Language> - Translated Text: "<Sentence>"
|
177 |
-
```
|
178 |
-
Example with Sentence Grouping:
|
179 |
-
```
|
180 |
-
0.01-0.05 - Speaker 1 - Language: English - Translated Text: "Proceed with caution."
|
181 |
-
0.06-0.12 - Speaker 2 - Language: English - Translated Text: "All systems are ready."
|
182 |
-
```
|
183 |
-
|
184 |
-
# Intelligence Summary Document Structure
|
185 |
-
|
186 |
-
Use the format below to create a structured summary for each conversation transcript received:
|
187 |
-
|
188 |
-
### 1. Top-Level Status & Assessment:
|
189 |
-
- **Threat Level Assessment**:
|
190 |
-
- Choose one:
|
191 |
-
- Completely Innocuous
|
192 |
-
- Likely Innocuous
|
193 |
-
- Unclear — Requires Investigation
|
194 |
-
- Likely Dangerous — Immediate Action
|
195 |
-
- Likely Dangerous — Delayed Action
|
196 |
-
- 100% Dangerous — Immediate Action
|
197 |
-
- 100% Dangerous — Delayed Action
|
198 |
-
- **Humanitarian Alert**: Identify any indications of distress, coercion, or need for assistance, such as signs of duress or requests for help.
|
199 |
-
|
200 |
-
### 2. Basic Metadata:
|
201 |
-
- **Number of Speakers**: Total and unique speakers detected.
|
202 |
-
- **Languages**: List of languages used, with indication of who spoke which language.
|
203 |
-
- **Location**: Actual or inferred locations of participants.
|
204 |
-
- **Communication Medium**: Identify the method of interaction (e.g., phone call, direct conversation).
|
205 |
-
|
206 |
-
### 3. Conversation Overview:
|
207 |
-
- **Summary**: Concise breakdown of the main points and context.
|
208 |
-
- **Alarming Keywords**: Identify any concerning words, including but not limited to keywords like "kill," "attack," "weapon," etc.
|
209 |
-
- **Suspicious or Cryptic Phrases**: Statements that appear coded or unclear in the context of the discussion.
|
210 |
-
|
211 |
-
### 4. In-Depth Analysis:
|
212 |
-
- **Network Connections**: Identify mentions of additional individuals or groups involved.
|
213 |
-
- **Intent & Emotional Tone Detection**: Analyze emotional cues (e.g., anger, fear, calmness, urgency). Identify signs of deception or tension.
|
214 |
-
- **Behavioral Patterns**: Highlight repeated themes, phrases, or signals of planning and coordination.
|
215 |
-
- **Code Words & Cryptic Language**: Detect terms that may indicate hidden or covert meaning.
|
216 |
-
- **Geolocation References**: Point out any inferences regarding regional language or place names.
|
217 |
-
- **Sentiment on Strategic Issues**: Identify any indication of radical, dissenting, or anti-national views that could imply unrest or extremism.
|
218 |
-
|
219 |
-
### 5. Resource Mentions & Operational Logistics:
|
220 |
-
- **Resource & Asset Mentions**: List any mention of tools, weapons, vehicles, or supply logistics.
|
221 |
-
- **Behavioral Deviations**: Identify shifts in tone, speech, or demeanor suggesting stress, coercion, urgency, or preparation.
|
222 |
-
|
223 |
-
### 6. Prioritization, Recommendations & Actionables:
|
224 |
-
- **High-Risk Alert Priority**: Identify whether the conversation should be flagged for further attention.
|
225 |
-
- **Recommended Actions**:
|
226 |
-
- **Surveillance**: Suggest surveillance if concerning patterns or keywords are detected.
|
227 |
-
- **Intervention**: Recommend intervention for urgent/high-risk cases.
|
228 |
-
- **Humanitarian Assistance**: Suggest immediate support for any signs of distress.
|
229 |
-
- **Follow-Up Analysis**: Identify statements that need deeper review for clarity or to understand potential hidden meanings.
|
230 |
-
|
231 |
-
# Steps
|
232 |
-
|
233 |
-
1. Analyze the input conversation for participant information and context.
|
234 |
-
2. Fill in each section of the Intelligence Summary Document structure.
|
235 |
-
3. Ensure all details, especially those related to potential risk factors or alerts, are captured and highlighted clearly.
|
236 |
-
|
237 |
-
# Output Format
|
238 |
-
|
239 |
-
Provide one structured Intelligence Summary Document for the conversation in either plain text format or structured JSON.
|
240 |
-
|
241 |
-
# JSON Format Example:
|
242 |
-
```json
|
243 |
-
{
|
244 |
-
"Top-Level Status & Assessment": {
|
245 |
-
"Threat Level Assessment": "Unclear - Requires Investigation",
|
246 |
-
"Humanitarian Alert": "No distress signals detected."
|
247 |
-
},
|
248 |
-
"Basic Metadata": {
|
249 |
-
"Number of Speakers": 2,
|
250 |
-
"Languages": {
|
251 |
-
"Speaker 1": "English",
|
252 |
-
"Speaker 2": "English"
|
253 |
-
},
|
254 |
-
"Location": "Unknown",
|
255 |
-
"Communication Medium": "Direct conversation"
|
256 |
-
},
|
257 |
-
"Conversation Overview": {
|
258 |
-
"Summary": "A cautious approach was suggested by Speaker 1, followed by an assurance from Speaker 2 that systems are ready.",
|
259 |
-
"Alarming Keywords": [],
|
260 |
-
"Suspicious or Cryptic Phrases": []
|
261 |
-
},
|
262 |
-
"In-Depth Analysis": {
|
263 |
-
"Network Connections": "None identified",
|
264 |
-
"Intent & Emotional Tone Detection": "Calm, precautionary tone",
|
265 |
-
"Behavioral Patterns": "Speaker 1 expressing concern, Speaker 2 providing assurance",
|
266 |
-
"Code Words & Cryptic Language": [],
|
267 |
-
"Geolocation References": [],
|
268 |
-
"Sentiment on Strategic Issues": "No radical or dissenting sentiment detected"
|
269 |
-
},
|
270 |
-
"Resource Mentions & Operational Logistics": {
|
271 |
-
"Resource & Asset Mentions": [],
|
272 |
-
"Behavioral Deviations": "None noted"
|
273 |
-
},
|
274 |
-
"Prioritization, Recommendations & Actionables": {
|
275 |
-
"High-Risk Alert Priority": "Low",
|
276 |
-
"Recommended Actions": {
|
277 |
-
"Surveillance": "No further surveillance needed.",
|
278 |
-
"Intervention": "Not required.",
|
279 |
-
"Humanitarian Assistance": "Not required.",
|
280 |
-
"Follow-Up Analysis": "No unusual phrases detected requiring review."
|
281 |
-
}
|
282 |
-
}
|
283 |
-
}
|
284 |
-
```
|
285 |
-
|
286 |
-
# Notes
|
287 |
-
|
288 |
-
- Ensure that you mark any ambiguous segments as requiring further investigation.
|
289 |
-
- Pay attention to emotional tone shifts or sudden changes in behavior.
|
290 |
-
- If any direct or implied threat is detected, prioritize appropriately using the provided classifications.
|
291 |
-
- Err on the side of caution. In case there is even a remote possibility that there might be something that required human attention, flag it.
|
292 |
-
Analyze a translated transcript of a conversation that may contain multiple speakers and summarize the information in a structured intelligence document.
|
293 |
-
|
294 |
-
The input format will include word-level or sentence-level timestamps, each indicating the speaker ID, language, and translated text.
|
295 |
-
|
296 |
-
# Input Format Overview
|
297 |
-
|
298 |
-
Word-Level Timestamps Example:
|
299 |
-
```
|
300 |
-
[Start Time - End Time] - Speaker <ID> - Language: <Translated Language> - Translated Text: "<Word>"
|
301 |
-
```
|
302 |
-
Example:
|
303 |
-
```
|
304 |
-
0.01-0.02 - Speaker 1 - Language: English - Translated Text: "Proceed"
|
305 |
-
0.02-0.025 - Speaker 1 - Language: English - Translated Text: "with"
|
306 |
-
0.025-0.032 - Speaker 2 - Language: English - Translated Text: "caution"
|
307 |
-
```
|
308 |
-
|
309 |
-
Optional Sentence-Level Structure Example:
|
310 |
-
```
|
311 |
-
[Start Time - End Time] - Speaker <ID> - Language: <Translated Language> - Translated Text: "<Sentence>"
|
312 |
-
```
|
313 |
-
Example with Sentence Grouping:
|
314 |
-
```
|
315 |
-
0.01-0.05 - Speaker 1 - Language: English - Translated Text: "Proceed with caution."
|
316 |
-
0.06-0.12 - Speaker 2 - Language: English - Translated Text: "All systems are ready."
|
317 |
-
```
|
318 |
-
|
319 |
-
# Intelligence Summary Document Structure
|
320 |
-
|
321 |
-
Use the format below to create a structured summary for each conversation transcript received:
|
322 |
-
|
323 |
-
### 1. Top-Level Status & Assessment:
|
324 |
-
- **Threat Level Assessment**:
|
325 |
-
- Choose one:
|
326 |
-
- Completely Innocuous
|
327 |
-
- Likely Innocuous
|
328 |
-
- Unclear — Requires Investigation
|
329 |
-
- Likely Dangerous — Immediate Action
|
330 |
-
- Likely Dangerous — Delayed Action
|
331 |
-
- 100% Dangerous — Immediate Action
|
332 |
-
- 100% Dangerous — Delayed Action
|
333 |
-
- **Humanitarian Alert**: Identify any indications of distress, coercion, or need for assistance, such as signs of duress or requests for help.
|
334 |
-
|
335 |
-
### 2. Basic Metadata:
|
336 |
-
- **Number of Speakers**: Total and unique speakers detected.
|
337 |
-
- **Languages**: List of languages used, with indication of who spoke which language.
|
338 |
-
- **Location**: Actual or inferred locations of participants.
|
339 |
-
- **Communication Medium**: Identify the method of interaction (e.g., phone call, direct conversation).
|
340 |
-
|
341 |
-
### 3. Conversation Overview:
|
342 |
-
- **Summary**: Concise breakdown of the main points and context.
|
343 |
-
- **Alarming Keywords**: Identify any concerning words, including but not limited to keywords like "kill," "attack," "weapon," etc.
|
344 |
-
- **Suspicious or Cryptic Phrases**: Statements that appear coded or unclear in the context of the discussion.
|
345 |
-
|
346 |
-
### 4. In-Depth Analysis:
|
347 |
-
- **Network Connections**: Identify mentions of additional individuals or groups involved.
|
348 |
-
- **Intent & Emotional Tone Detection**: Analyze emotional cues (e.g., anger, fear, calmness, urgency). Identify signs of deception or tension.
|
349 |
-
- **Behavioral Patterns**: Highlight repeated themes, phrases, or signals of planning and coordination.
|
350 |
-
- **Code Words & Cryptic Language**: Detect terms that may indicate hidden or covert meaning.
|
351 |
-
- **Geolocation References**: Point out any inferences regarding regional language or place names.
|
352 |
-
- **Sentiment on Strategic Issues**: Identify any indication of radical, dissenting, or anti-national views that could imply unrest or extremism.
|
353 |
-
|
354 |
-
### 5. Resource Mentions & Operational Logistics:
|
355 |
-
- **Resource & Asset Mentions**: List any mention of tools, weapons, vehicles, or supply logistics.
|
356 |
-
- **Behavioral Deviations**: Identify shifts in tone, speech, or demeanor suggesting stress, coercion, urgency, or preparation.
|
357 |
-
|
358 |
-
### 6. Prioritization, Recommendations & Actionables:
|
359 |
-
- **High-Risk Alert Priority**: Identify whether the conversation should be flagged for further attention.
|
360 |
-
- **Recommended Actions**:
|
361 |
-
- **Surveillance**: Suggest surveillance if concerning patterns or keywords are detected.
|
362 |
-
- **Intervention**: Recommend intervention for urgent/high-risk cases.
|
363 |
-
- **Humanitarian Assistance**: Suggest immediate support for any signs of distress.
|
364 |
-
- **Follow-Up Analysis**: Identify statements that need deeper review for clarity or to understand potential hidden meanings.
|
365 |
-
|
366 |
-
# Steps
|
367 |
-
|
368 |
-
1. Analyze the input conversation for participant information and context.
|
369 |
-
2. Fill in each section of the Intelligence Summary Document structure.
|
370 |
-
3. Ensure all details, especially those related to potential risk factors or alerts, are captured and highlighted clearly.
|
371 |
-
|
372 |
-
# Output Format
|
373 |
-
|
374 |
-
Provide one structured Intelligence Summary Document for the conversation in either plain text format or structured JSON.
|
375 |
-
|
376 |
-
# JSON Format Example:
|
377 |
-
```json
|
378 |
-
{
|
379 |
-
"Top-Level Status & Assessment": {
|
380 |
-
"Threat Level Assessment": "Unclear - Requires Investigation",
|
381 |
-
"Humanitarian Alert": "No distress signals detected."
|
382 |
-
},
|
383 |
-
"Basic Metadata": {
|
384 |
-
"Number of Speakers": 2,
|
385 |
-
"Languages": {
|
386 |
-
"Speaker 1": "English",
|
387 |
-
"Speaker 2": "English"
|
388 |
-
},
|
389 |
-
"Location": "Unknown",
|
390 |
-
"Communication Medium": "Direct conversation"
|
391 |
-
},
|
392 |
-
"Conversation Overview": {
|
393 |
-
"Summary": "A cautious approach was suggested by Speaker 1, followed by an assurance from Speaker 2 that systems are ready.",
|
394 |
-
"Alarming Keywords": [],
|
395 |
-
"Suspicious or Cryptic Phrases": []
|
396 |
-
},
|
397 |
-
"In-Depth Analysis": {
|
398 |
-
"Network Connections": "None identified",
|
399 |
-
"Intent & Emotional Tone Detection": "Calm, precautionary tone",
|
400 |
-
"Behavioral Patterns": "Speaker 1 expressing concern, Speaker 2 providing assurance",
|
401 |
-
"Code Words & Cryptic Language": [],
|
402 |
-
"Geolocation References": [],
|
403 |
-
"Sentiment on Strategic Issues": "No radical or dissenting sentiment detected"
|
404 |
-
},
|
405 |
-
"Resource Mentions & Operational Logistics": {
|
406 |
-
"Resource & Asset Mentions": [],
|
407 |
-
"Behavioral Deviations": "None noted"
|
408 |
-
},
|
409 |
-
"Prioritization, Recommendations & Actionables": {
|
410 |
-
"High-Risk Alert Priority": "Low",
|
411 |
-
"Recommended Actions": {
|
412 |
-
"Surveillance": "No further surveillance needed.",
|
413 |
-
"Intervention": "Not required.",
|
414 |
-
"Humanitarian Assistance": "Not required.",
|
415 |
-
"Follow-Up Analysis": "No unusual phrases detected requiring review."
|
416 |
-
}
|
417 |
-
}
|
418 |
-
}
|
419 |
-
```
|
420 |
-
|
421 |
-
# Notes
|
422 |
-
|
423 |
-
- Ensure that you mark any ambiguous segments as requiring further investigation.
|
424 |
-
- Pay attention to emotional tone shifts or sudden changes in behavior.
|
425 |
-
- If any direct or implied threat is detected, prioritize appropriately using the provided classifications.
|
426 |
-
- Err on the side of caution. In case there is even a remote possibility that there might be something that required human attention, flag it.
|
427 |
-
|
428 |
-
"""},
|
429 |
{"role": "user", "content": f"Context: {text}\n\nQuestion: {question}"}
|
430 |
]
|
431 |
|
|
|
9 |
from audio_processing import AudioProcessor
|
10 |
import spaces
|
11 |
from chunkedTranscriber import ChunkedTranscriber
|
12 |
+
from system_message import SYSTEM_MESSAGE
|
13 |
|
14 |
|
15 |
logging.basicConfig(
|
|
|
19 |
)
|
20 |
logger = logging.getLogger(__name__)
|
21 |
|
22 |
+
|
23 |
def load_qa_model():
|
24 |
"""Load question-answering model with long context support."""
|
25 |
try:
|
|
|
155 |
|
156 |
messages = [
|
157 |
# {"role": "system", "content": "You are a helpful assistant who can answer questions based on the given context."},
|
158 |
+
{"role":"system", "content": SYSTEM_MESSAGE},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
159 |
{"role": "user", "content": f"Context: {text}\n\nQuestion: {question}"}
|
160 |
]
|
161 |
|