Meta has just released NotebookLlama, an AI-tool that automatically creates podcasts from PDF files. This tool is an open-source alternative to the Audio Overview feature in Google's Notebook LM. NotebookLlama leverages the cutting-edge capabilities of Meta's Llama family of models, including Llama 3.2, Llama 3.1 and the open-source Parler text-to-speech model. Everyone across the AI space is excited to experiment with this advanced model. If you’re also looking to get started, you’re at the right place.
Our latest tutorial below explores how to deploy and use Notebook Llama on Hyperstack.
Let's walk through the step-by-step process to get started with Notebook Llama on Hyperstack:
Initiate Deployment
Select Hardware Configuration
Choose the Operating System
Select a Keypair
Network Configuration
Enable SSH Access
Configure Additional Settings
Please note: This will open port 8888 to the public internet, allowing anyone with the public IP address and port number to access the dashboard. If you don't want this, you can restrict the IP addresses that can access the VM on port 8888 (see instructions here)
Review and Deploy the Script
This tutorial will only enable Notebook Llama once for demo-ing purposes. For production environments, consider using production-grade deployments with secret management, monitoring etc.
Once the initialisation is complete, you might need to reboot your VM to allow GPU access for your Jupyter Notebook server
ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address]
# Once connected through SSH
sudo reboot
Afterwards, you can access your Jupyter Notebook server. If you are having any issues, please check out our Troubleshooting tips below.
Option 1: Connect via public IP
If you used the default settings, use the following steps to connect with the Jupyter Notebook server
Option 2 Connect via SSH tunnel
ssh -i [path_to_ssh_key] -L 8888:localhost:8888 [os_username]@[vm_ip_address]
# e.g: ssh -i /users/username/downloads/keypair_hyperstack -L 8888:localhost:8888 ubuntu@0.0.0.0
Afterwards, enter your password you created in the steps above.
If you see any SSL warnings, you can skip them for now. They are related to the self-signed certificates being used for the HTTPS connection. For more info on this and its potential risks, see this blog.
Notebook Llama enables you to transform any PDF into a podcast using open-source models in four steps:
Below we describe the exact steps of running these notebooks, including a couple of fixes that are needed (at the time of writing).
If you want to get started faster. Refer to the final notebooks below.
Once you open the Jupyter Notebook server, use the following steps to
# Install Git
apt-get update
apt-get install git -y
# Clone repository
cd home
git clone https://github.com/meta-llama/llama-recipes
## Core dependencies
PyPDF2>=3.0.0
# torch>=2.0.0
transformers>=4.46.0
accelerate>=0.27.0
rich>=13.0.0
ipywidgets>=8.0.0
tqdm>=4.66.0
# Optional but recommended
jupyter>=1.0.0
ipykernel>=6.0.0
# Warning handling
# warnings>=0.1.0
!pip install PyPDF2
!pip install rich ipywidgets
!pip install -r requirements.txt
import PyPDF2
from typing import Optional
import os
import torch
from accelerate import Accelerator
from transformers import AutoModelForCausalLM, AutoTokenizer
from tqdm.notebook import tqdm
import warnings
warnings.filterwarnings('ignore')
os.environ["HF_TOKEN"] = "[replace-with-your-token]"
INPUT_FILE = "./resources/extracted_text.txt"
CHUNK_SIZE = 1000
chunks = create_word_bounded_chunks(extracted_text, CHUNK_SIZE)
num_chunks = len(chunks)
processed_text = []
# Extract metadata first
print("Extracting metadata...")
metadata = get_pdf_metadata(pdf_path)
if metadata:
print("\nPDF Metadata:")
print(f"Number of pages: {metadata['num_pages']}")
print("Document info:")
for key, value in metadata['metadata'].items():
print(f"{key}: {value}")
# Extract text
print("\nExtracting text...")
extracted_text = extract_text_from_pdf(pdf_path)
# Display first 500 characters of extracted text as preview
if extracted_text:
print("\nPreview of extracted text (first 500 characters):")
print("-" * 50)
print(extracted_text[:500])
print("-" * 50)
print(f"\nTotal characters extracted: {len(extracted_text)}")
# Optional: Save the extracted text to a file
if extracted_text:
output_file = './resources/extracted_text.txt'
with open(output_file, 'w', encoding='utf-8') as f:
f.write(extracted_text)
print(f"\nExtracted text has been saved to {output_file}")
import os
os.environ["HF_TOKEN"] = "[your-hf-token]"
MODEL = "meta-llama/Llama-3.1-8B-Instruct"
# Extract metadata first
INPUT_PROMPT = read_file_to_string('./resources/clean_extracted_text.txt')
import os
os.environ["HF_TOKEN"] = "[your-hf-token]"
MODEL = "meta-llama/Llama-3.1-8B-Instruct"
#!pip3 install optimum
#!pip install -U flash-attn --no-build-isolation
#!pip install transformers==4.43.3
!apt-get install -y ffmpeg
!pip install git+https://github.com/huggingface/parler-tts.git
!pip install pydub
device = "cuda"
processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark", torch_dtype=torch.float16).to(device)#.to_bettertransformer()
bark_processor = AutoProcessor.from_pretrained("suno/bark")
bark_model = BarkModel.from_pretrained("suno/bark", torch_dtype=torch.float16).to("cuda")
bark_sampling_rate = 24000
parler_model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to("cuda")
parler_tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")
device="cuda"
from scipy.io import wavfile
from pydub import AudioSegment
def numpy_to_audio_segment(audio_arr, sampling_rate):
"""Convert numpy array to AudioSegment"""
# Ensure audio array is normalized to the range [-1, 1]
audio_arr = np.clip(audio_arr, -1, 1)
# Convert to 16-bit PCM
audio_int16 = (audio_arr * 32767).astype(np.int16)
# Create WAV file in memory
byte_io = io.BytesIO()
wavfile.write(byte_io, sampling_rate, audio_int16)
byte_io.seek(0)
# Convert to AudioSegment
return AudioSegment.from_wav(byte_io)
import io
final_audio = None
for speaker, text in tqdm(ast.literal_eval(PODCAST_TEXT), desc="Generating podcast segments", unit="segment"):
if speaker == "Speaker 1":
audio_arr, rate = generate_speaker1_audio(text)
else: # Speaker 2
audio_arr, rate = generate_speaker2_audio(text)
# Convert to AudioSegment (pydub will handle sample rate conversion automatically)
audio_segment = numpy_to_audio_segment(audio_arr, rate)
# Add to final audio
if final_audio is None:
final_audio = audio_segment
else:
final_audio += audio_segment
To play your generated podcast. Follow the instructions below:
If you are having any issues, please follow the following instructions:
To resolve any Jupyter Notebook Server issues, run the commands below to debug your errors.
ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address]
# Once connected through SSH
# Check whether the cloud-init finished running
cat /var/log/cloud-init-output.log
# Expected output similar to:
# Cloud-init v. 24.2-0ubuntu1~22.04.1 finished at Thu, 31 Oct 2024 03:33:48 +0000. Datasource DataSourceOpenStackLocal [net,ver=2]. Up 52.62 seconds
# Check whether the Jupyter notebook server has started
docker ps
# Expected output similar to:
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 5c77b8a4154c jupyter_image "jupyter notebook --…" 1 second ago Up Less than a second 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp inspiring_thompson
# Check any errors during Jupyter notebook server setup
# cat load_docker_error.log
In case you are running into any notebook issues or want to get started faster. Refer to our final notebooks here:
When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:
To continue your work without repeating the setup process:
Similar Reads:
Meta has just released NotebookLlama, an AI tool that automatically creates podcasts from PDF files. This tool is an open-source alternative to the Audio Overview feature in Google's Notebook LM.
The main steps to setting up Notebook Llama on Hyperstack include accessing Hyperstack, deploying a new virtual machine, setting up the model and running Jupyter Notebooks.
For using the smaller Llama 1B and 8B Llama models, we recommend choosing the NVIDIA RTX A6000 x1. For using the larger Llama 70B model we recommend the NVIDIA A100 x2.
Yes, you can hibernate your VM to avoid unnecessary costs, preserving your setup for future use without incurring compute charges.