martintomov commited on
Commit
befa457
·
1 Parent(s): f19be62

readme updates

Browse files
Files changed (1) hide show
  1. README.md +61 -4
README.md CHANGED
@@ -1,13 +1,70 @@
1
  ---
2
- title: Gpt4v Voiceover
3
- emoji: 🏆
4
  colorFrom: green
5
  colorTo: pink
6
  sdk: streamlit
7
  sdk_version: 1.32.2
8
  app_file: app.py
9
- pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: GPT4V Voiceover
3
+ emoji: 🎥🔮
4
  colorFrom: green
5
  colorTo: pink
6
  sdk: streamlit
7
  sdk_version: 1.32.2
8
  app_file: app.py
9
+ pinned: true
10
  license: mit
11
  ---
12
 
13
+ # AI Voiceover with GPT4V
14
+ ### [GitHub Repo](https://github.com/martintmv-git/gpt4v-streamlit-voiceover)
15
+
16
+ This Streamlit application, along with a Jupyter notebook implementation, demonstrates the use of AI and machine learning to automate the process of generating voiceovers for videos. The solution involves processing video, generating narratives based on the video content, converting the narratives to audio, and then merging the audio back into the video for a complete voiceover experience.
17
+
18
+ ## Demo
19
+
20
+ ### Input Video
21
+ [Input Video](https://github.com/martintmv-git/gpt4v-streamlit-voiceover/assets/101264514/388d20c1-e61d-4f50-8641-4217886e2047)
22
+
23
+ ### Output Video
24
+ [Output Video](https://github.com/martintmv-git/gpt4v-streamlit-voiceover/assets/101264514/1aeb3caf-443d-4e94-abf1-4a9cf795fafb)
25
+
26
+ ### Jupyter Notebook
27
+ <img width="1680" alt="Screenshot 2024-01-30 at 13 52 29" src="https://github.com/martintmv-git/gpt4v-streamlit-voiceover/assets/101264514/a8f05ef6-79b1-40ad-9998-8d52b424c1c5">
28
+
29
+ ## Features
30
+
31
+ - **Video Processing**: Converts a video into frames using OpenCV.
32
+ - **Narrative Generation**: Utilizes OpenAI's GPT-4 Vision model to create stories or scripts based on the video frames.
33
+ - **Voiceover Generation**: Converts the generated text into a voiceover using ElevenLabs's text-to-speech API.
34
+ - **Audio and Video Merging**: Combines the generated voiceover with the original video, extending or trimming the video as needed to match the voiceover duration.
35
+
36
+ ## Workflow
37
+
38
+ 1. **Environment Setup**: Load necessary API keys and configurations.
39
+ 2. **Video to Frames**: Convert a video into individual frames suitable for AI processing.
40
+ 3. **AI-Generated Script**: Use OpenAI's GPT-4 model to create a script based on the video frames.
41
+ 4. **Text to Speech**: Convert the script to audio with OpenAI's or ElevenLabs's TTS service.
42
+ 5. **Video Finalization**: Merge the audio back into the video, adjusting the video duration to match the audio if necessary.
43
+
44
+ ## Jupyter Notebook Implementation
45
+
46
+ The Jupyter notebook `voiceover_jupyter-notebook.ipynb` includes the full implementation of the AI voiceover process:
47
+
48
+ - **Extracting Video Frames**: Load a video file and extract frames as base64-encoded images.
49
+ - **AI Script Generation**: Send the frames to OpenAI's GPT-4 model to generate a voiceover script.
50
+ - **Text-to-Speech Conversion**: Convert the script into a voiceover audio file using OpenAI's or ElevenLabs's TTS service.
51
+
52
+ The notebook provides a step-by-step guide, complete with code and markdown explanations, to illustrate the entire process of creating an AI-generated voiceover for video content.
53
+
54
+ ## Dependencies
55
+
56
+ - `python-dotenv`: For loading environment variables.
57
+ - `moviepy`: For video and audio processing.
58
+ - `opencv-python`: For handling video frames.
59
+ - `openai`: For accessing OpenAI's GPT-4 API.
60
+ - `requests`: For making HTTP requests to the TTS API.
61
+ - `streamlit`: For creating the web-based UI (for the Streamlit app).
62
+
63
+ ## Requirements
64
+
65
+ - An OpenAI API key and/or ElevenLabs API key are required.
66
+ - Python 3.x and the above-mentioned libraries.
67
+
68
+ ## Disclaimer
69
+
70
+ This project is for demonstration purposes and showcases the integration of AI models with video and audio processing in Python, using both a Streamlit app and a Jupyter Notebook.