Hyperstack - Tutorials

Deploying and Using Notebook Llama on Hyperstack: A Quick Start Guide

Written by Sebastian Panman de Wit | Oct 31, 2024 12:29:28 PM

Meta has just released NotebookLlama, an AI-tool that automatically creates podcasts from PDF files. This tool is an open-source alternative to the Audio Overview feature in Google's Notebook LM. NotebookLlama leverages the cutting-edge capabilities of Meta's Llama family of models, including Llama 3.2, Llama 3.1 and the open-source Parler text-to-speech model. Everyone across the AI space is excited to experiment with this advanced model. If you’re also looking to get started, you’re at the right place.

Our latest tutorial below explores how to deploy and use Notebook Llama on Hyperstack. 

How to Get Started with Notebook Llama

Let's walk through the step-by-step process to get started with Notebook Llama on Hyperstack:

Step 1: Accessing Hyperstack

  1. Visit the Hyperstack website and log in to your account.
  2. If you don't already have an account, you'll need to create one and set up your billing information. Check our documentation to get started with Hyperstack.
  3. Once logged in, you'll enter the Hyperstack dashboard, which overviews your resources and deployments.

Step 2: Deploying a New Virtual Machine

Initiate Deployment

  1. Navigate to the "Virtual Machines" section and click "Deploy New Virtual Machine."
  2. Click it to start the deployment process.

Select Hardware Configuration

  1. For using the smaller Llama 1B and 8B Llama models, we recommend choosing the NVIDIA RTX A6000 x1 for an affordable GPU. For faster processing speeds or if you are using the larger Llama 70B model we recommend the NVIDIA A100 x2.

Choose the Operating System

  1. Select the "Server 22.04 LTS R535 CUDA 12.2 with Docker".
  2. This image comes pre-installed with Ubuntu 22.04 LTS, NVIDIA drivers (R535), CUDA 12.2, providing an optimised environment for AI workloads.

Select a Keypair

  1. Select one of the keypairs in your account. If you don't have a keypair yet, see our Getting Started tutorial for creating one.

Network Configuration

  1. Ensure you assign a Public IP to your Virtual machine.
  2. This allows you to access your VM from the internet, which is crucial for remote management and API access.

Enable SSH Access

  1. Make sure to enable an SSH connection.
  2. You'll need this to connect and manage your VM securely.

Configure Additional Settings

  1. Look for the "Configure Additional Settings" section and click on it.
  2. Here, you'll find the 'Install Jupyter Notebook' toggle. Enable this and set a password which you will need later.

Please note: This will open port 8888 to the public internet, allowing anyone with the public IP address and port number to access the dashboard. If you don't want this, you can restrict the IP addresses that can access the VM on port 8888 (see instructions here)

Review and Deploy the Script

  1. Double-check all your settings.
  2. Click the "Deploy" button to launch your virtual machine.
DISCLAIMER

This tutorial will only enable Notebook Llama once for demo-ing purposes. For production environments, consider using production-grade deployments with secret management, monitoring etc.

Step 3: Setting Up the Model

  1. The VM will configure install libraries and install the Jupyter Notebook server. This will take about 5 - 10 minutes. 

Step 4: Accessing the Jupyter Notebook server

Once the initialisation is complete, you might need to reboot your VM to allow GPU access for your Jupyter Notebook server 

ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address]

# Once connected through SSH
sudo reboot

Afterwards, you can access your Jupyter Notebook server. If you are having any issues, please check out our Troubleshooting tips below.

Option 1: Connect via public IP

If you used the default settings, use the following steps to connect with the Jupyter Notebook server 

  1. Open a browser on your local machine
  2. Go to https://[public-ip]:8888/lab (make sure to add the s in https)

Option 2 Connect via SSH tunnel

  1. Open a terminal on your local machine.
  2. Use the command
    ssh -i [path_to_ssh_key] -L 8888:localhost:8888 [os_username]@[vm_ip_address]
    # e.g: ssh -i /users/username/downloads/keypair_hyperstack -L 8888:localhost:8888 ubuntu@0.0.0.0
    
  3. Replace username and ip_address with the details provided by Hyperstack.
  4. Open a browser on your local machine
  5. Go to https://localhost:8888/lab (make sure to add the s in https)

Afterwards, enter your password you created in the steps above. 

If you see any SSL warnings, you can skip them for now. They are related to the self-signed certificates being used for the HTTPS connection. For more info on this and its potential risks, see this blog.

Run Notebook Llama

Introduction

Notebook Llama enables you to transform any PDF into a podcast using open-source models in four steps:

  1. PDF Pre-processing: Upload a PDF, extract the text using PyPDF2, and save it as a .txt file for processing.
  2. Transcript Writer: Use a Llama LLM to generate a podcast-ready transcript from the cleaned text.
  3. Transcript Rewriter: Refine the transcript with a Llama LLM for enhanced narrative quality and TTS readiness.
  4. Text-to-Speech (TTS) Workflow: Convert the polished transcript into audio with the suno/bark or parler-tts models, completing the podcast creation.

Below we describe the exact steps of running these notebooks, including a couple of fixes that are needed (at the time of writing). 

If you want to get started faster. Refer to the final notebooks below.

Running Notebook 1 of 4 - PDF pre-processing

Once you open the Jupyter Notebook server, use the following steps to 

  1. Enter the password you used in 'Deploying your Virtual Machine'
  2. Click on the 'Terminal' Icon in the Launcher. Alternatively, click in the top-left corner on: File > New > Terminal 
  3. Run the following command in the terminal
    # Install Git
    apt-get update
    apt-get install git -y
    
    # Clone repository
    cd home
    git clone https://github.com/meta-llama/llama-recipes
    
  4. Navigate to the folder /home/llama-recipes/recipes/quickstart/NotebookLlama in the sidebar menu on the left.
  5. Open requirements.txt by double-clicking on the file
  6. Comment on the third and last line, so your requirements.txt is as follows. We do not need a new PyTorch installation as our Jupyter Notebook installation comes with PyTorch already installed!
    ## Core dependencies
    PyPDF2>=3.0.0
    # torch>=2.0.0
    transformers>=4.46.0
    accelerate>=0.27.0
    rich>=13.0.0
    ipywidgets>=8.0.0
    tqdm>=4.66.0
    
    # Optional but recommended
    jupyter>=1.0.0
    ipykernel>=6.0.0
    
    # Warning handling
    # warnings>=0.1.0
    
  7. Save the file with CTRL + S (or CMD + S on MacOs). Alternatively, click on File > Save Text in the top-left corner.
  8. Go back to /home/llama-recipes/recipes/quickstart/NotebookLlama
  9. Open the first notebook: Step1- PDF-Pre-Processing-Logic
  10. Add !pip install -r requirements.txt to the first cell. So your first cell should look like this:
    !pip install PyPDF2
    !pip install rich ipywidgets
    !pip install -r requirements.txt
    
  11. To use these models you need to request gated access:
    1. Request access here for the default small models: 
        1. https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
        2. https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
    2. Create a HuggingFace token to access the gated model, see more info here.
  12. You can also use a larger model (if you are using a more performant flavour like the 2xA100). Simply replace the 'DEFAULT_MODEL' variable in the Jupyter Notebook. Make sure to request gated access to the model. For example, use this code inside the Jupyter Notebooks to use the 70B model:  DEFAULT_MODEL= "meta-llama/Llama-3.1-70B-Instruct"
  13. To set your HugginFace token, add the token to your environment variable with the code os.environ["HF_TOKEN"] = "[replace-with-your-token]". The third cell should look like this:
    import PyPDF2
    from typing import Optional
    import os
    import torch
    from accelerate import Accelerator
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    from tqdm.notebook import tqdm
    import warnings
    
    warnings.filterwarnings('ignore')
    
    os.environ["HF_TOKEN"] = "[replace-with-your-token]"
    
  14. Fix the output path in cell 7. So your cell 7 should look like this:
    INPUT_FILE = "./resources/extracted_text.txt"
    CHUNK_SIZE = 1000 chunks = create_word_bounded_chunks(extracted_text, CHUNK_SIZE) num_chunks = len(chunks) processed_text = []
  15. Fix the variables in cell 17. So your cell 17 should look like this:
    # Extract metadata first
    print("Extracting metadata...")
    metadata = get_pdf_metadata(pdf_path)
    if metadata:
        print("\nPDF Metadata:")
        print(f"Number of pages: {metadata['num_pages']}")
        print("Document info:")
        for key, value in metadata['metadata'].items():
            print(f"{key}: {value}")
    
    # Extract text
    print("\nExtracting text...")
    extracted_text = extract_text_from_pdf(pdf_path)
    
    # Display first 500 characters of extracted text as preview
    if extracted_text:
        print("\nPreview of extracted text (first 500 characters):")
        print("-" * 50)
        print(extracted_text[:500])
        print("-" * 50)
        print(f"\nTotal characters extracted: {len(extracted_text)}")
    
    # Optional: Save the extracted text to a file
    if extracted_text:
        output_file = './resources/extracted_text.txt'
        with open(output_file, 'w', encoding='utf-8') as f:
            f.write(extracted_text)
        print(f"\nExtracted text has been saved to {output_file}")
    
  16. Run all commands inside the Notebook. Your last cell should look like the image below.
  17. Once your script is finished, close the notebook by clicking on File > Closed and Shut Down Notebook. Confirm the closing of the notebook by clicking OK on the confirmation dialog.

Running Notebook 2 of 4 - Transcript writer

  1. Go to your 'Home' page in your Jupyter Notebook. This should be the previous tab in your browser.
  2. Navigate to the folder /home/llama-recipes/recipes/quickstart/NotebookLlama in the sidebar menu on the left.
  3. Open the second notebook: Step-2 Transcript-Writer
  4. Change the second cell by:
    1. Adding the HuggingFace token for gated access of the Llama models
    2. Changing the MODEL in the second cell if you are using the small Llama models: 
      import os
      os.environ["HF_TOKEN"] = "[your-hf-token]"
      
      MODEL = "meta-llama/Llama-3.1-8B-Instruct"
      
  5. Change path used for the INPUT_PROMPT in cell 5:
    # Extract metadata first
    INPUT_PROMPT = read_file_to_string('./resources/clean_extracted_text.txt')
    
  6. Run all cells in the notebook
  7. The final cell should look like this:
  8. Once your script is finished, close the notebook by clicking on File > Closed and Shut Down Notebook. Confirm the closing of the notebook by clicking OK on the confirmation dialog.

Running Notebook 3 of 4 - Re-writer

  1. Go to your 'Home' page in your Jupyter Notebook. This should be the previous tab in your browser.
  2. Navigate to the folder /home/llama-recipes/recipes/quickstart/NotebookLlama in the sidebar menu on the left.
  3. Change the second cell by adding the HuggingFace token for gated access of the Llama models:
    import os 
    os.environ["HF_TOKEN"] = "[your-hf-token]"
    
    MODEL = "meta-llama/Llama-3.1-8B-Instruct"
    
  4. Run all cells inside the notebook
  5. Your final cell should look something like this
  6. Once your script is finished, close the notebook by clicking on File > Closed and Shut Down Notebook. Confirm the closing of the notebook by clicking OK on the confirmation dialogue.

Running Notebook 4 of 4 - Text-to-speech

  1. Go to your 'Home' page in your Jupyter Notebook. This should be the previous tab in your browser.
  2. Navigate to the folder /home/llama-recipes/recipes/quickstart/NotebookLlama in the sidebar menu on the left.
  3. Change the first cell by adding the ffmpeg, parler-tts and pydub installation
    #!pip3 install optimum
    #!pip install -U flash-attn --no-build-isolation
    #!pip install transformers==4.43.3
    !apt-get install -y ffmpeg
    !pip install git+https://github.com/huggingface/parler-tts.git
    !pip install pydub
    
  4. Change the 17th cell by changing the device
    device = "cuda"
    
    processor = AutoProcessor.from_pretrained("suno/bark")
    
    model = BarkModel.from_pretrained("suno/bark", torch_dtype=torch.float16).to(device)#.to_bettertransformer()
    
  5. Change the 20th cell by changing the device
    bark_processor = AutoProcessor.from_pretrained("suno/bark")
    bark_model = BarkModel.from_pretrained("suno/bark", torch_dtype=torch.float16).to("cuda")
    bark_sampling_rate = 24000
    
  6. Change the 24th cell by changing the device
    parler_model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-v1").to("cuda")
    parler_tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")
    
  7. Change the 27th cell by changing the device
    device="cuda"
    
  8. Fix the 30th cell by adding the missing import
    from scipy.io import wavfile
    from pydub import AudioSegment
    
    def numpy_to_audio_segment(audio_arr, sampling_rate):
        """Convert numpy array to AudioSegment"""
        # Ensure audio array is normalized to the range [-1, 1]
        audio_arr = np.clip(audio_arr, -1, 1)
    
        # Convert to 16-bit PCM
        audio_int16 = (audio_arr * 32767).astype(np.int16)
        
        # Create WAV file in memory
        byte_io = io.BytesIO()
        wavfile.write(byte_io, sampling_rate, audio_int16)
        byte_io.seek(0)
        
        # Convert to AudioSegment
        return AudioSegment.from_wav(byte_io)
    
  9. Fix the 33rd cell by adding the missing import
    import io 
    
    final_audio = None
    
    for speaker, text in tqdm(ast.literal_eval(PODCAST_TEXT), desc="Generating podcast segments", unit="segment"):
        if speaker == "Speaker 1":
            audio_arr, rate = generate_speaker1_audio(text)
        else:  # Speaker 2
            audio_arr, rate = generate_speaker2_audio(text)
        
        # Convert to AudioSegment (pydub will handle sample rate conversion automatically)
        audio_segment = numpy_to_audio_segment(audio_arr, rate)
        
        # Add to final audio
        if final_audio is None:
            final_audio = audio_segment
        else:
            final_audio += audio_segment
    
  10. Run all cells in the notebook
  11. By the end of this step, your final cell should resemble the example shown below:
  12. Once your script is finished, close the notebook by clicking on File > Closed and Shut Down Notebook. Confirm the closing of the notebook by clicking OK on the confirmation dialogue.

Play your AI-generated podcast

To play your generated podcast. Follow the instructions below:

  1. Go to your 'Home' page in your Jupyter Notebook. This should be the previous tab in your browser.
  2. Navigate to the folder /home/llama-recipes/recipes/quickstart/NotebookLlama/resources in the sidebar menu on the left.
  3. Right-click on the _podcast.mp3 file and click Download
  4. Go to your Downloads folder and double-click the file to play your AI-generated podcast
  5. Here's a demo of the audio podcast we generated. The image in the video has been added separately by us.


Troubleshooting Tips

If you are having any issues, please follow the following instructions:

Jupyter notebook server

To resolve any Jupyter Notebook Server issues, run the commands below to debug your errors.

ssh -i [path_to_ssh_key] [os_username]@[vm_ip_address]

# Once connected through SSH

# Check whether the cloud-init finished running
cat /var/log/cloud-init-output.log
# Expected output similar to:
# Cloud-init v. 24.2-0ubuntu1~22.04.1 finished at Thu, 31 Oct 2024 03:33:48 +0000. Datasource DataSourceOpenStackLocal [net,ver=2].  Up 52.62 seconds

# Check whether the Jupyter notebook server has started
docker ps
# Expected output similar to:
# CONTAINER ID   IMAGE           COMMAND                  CREATED        STATUS                  PORTS                                       NAMES
# 5c77b8a4154c   jupyter_image   "jupyter notebook --…"   1 second ago   Up Less than a second   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp   inspiring_thompson

# Check any errors during Jupyter notebook server setup
# cat load_docker_error.log
Notebooks

In case you are running into any notebook issues or want to get started faster. Refer to our final notebooks here:

Step 5: Hibernating Your VM

When you're finished with your current workload, you can hibernate your VM to avoid incurring unnecessary costs:

  1. In the Hyperstack dashboard, locate your Virtual machine.
  2. Look for a "Hibernate" option.
  3. Click to hibernate the VM, stopping billing for compute resources while preserving your setup.

To continue your work without repeating the setup process:

  1. Return to the Hyperstack dashboard and find your hibernated VM.
  2. Select the "Resume" or "Start" option.
  3. Wait a few moments for the VM to become active.
  4. Reconnect via SSH using the same credentials as before.

Similar Reads:

FAQs

What is Notebook Llama?

Meta has just released NotebookLlama, an AI tool that automatically creates podcasts from PDF files. This tool is an open-source alternative to the Audio Overview feature in Google's Notebook LM.

How to set up Notebook Llama on Hyperstack?

The main steps to setting up Notebook Llama on Hyperstack include accessing Hyperstack, deploying a new virtual machine, setting up the model and running Jupyter Notebooks.

What GPU should I select for deploying Notebook Llama?

For using the smaller Llama 1B and 8B Llama models, we recommend choosing the NVIDIA RTX A6000 x1. For using the larger Llama 70B model we recommend the NVIDIA A100 x2.

Can I pause my virtual machine after using Notebook Llama on Hyperstack?

Yes, you can hibernate your VM to avoid unnecessary costs, preserving your setup for future use without incurring compute charges.