Let's build a Text to Music Generation App using Generative AI

Introduction

In this guide, we will develop a text-to-music generation application using Meta's AudioCraft library, specifically leveraging the MusicGen model. This application will allow end users to input a text prompt, from which it will generate corresponding music. In this step-by-step tutorial, we will make use of Streamlit for a user-friendly interface while implementing various functions for model loading, music generation, audio saving, and file downloading. Let’s dive in!

Prerequisites

Before we get started, make sure you have the necessary libraries installed. Clone the AudioCraft GitHub repository and install the requirements.

git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft
pip install -e .

Note: It's recommended to check the dependencies and install them carefully, especially if you're using Python 3.8 or higher.

Setting Up the Project

Open your VS Code and create a new file named app.py.

Import Necessary Libraries:

import streamlit as st
import os
import torch
import numpy as np
import base64
from audiocraft.models import music_gen

Load the MusicGen Model:

Create a function to load the pre-trained MusicGen model.

@st.cache_resource
def load_model():
    model = music_gen.from_pretrained("facebook/musicgen-small")
    return model

Creating the Streamlit Interface

Set Up the Streamlit App:

Define the layout and page configuration for the application.

st.set_page_config(page_title="Music Gen", page_icon="?")
st.title("Text to Music Generation")

with st.expander("See Explanation"):
    st.write("""
    This app is a music generation application built using Meta's AudioCraft library and
    it can generate music based on your natural language description.
    """)

Get User Input:

Add a text area for user prompts and a slider to select the audio duration.

description = st.text_area("Enter your description:")
duration = st.slider("Select time duration (seconds)", 2, 20, 5)

Implement Music Generation Functionality

Generate Music from Text:

Create functions to generate music based on user input.

def generate_music_tensors(description, duration):
    model = load_model()
    generation_params = (
        "use_sampling": True,
        "top_k": 50,
        "duration": duration
    )
    output = model.generate([description], **generation_params)
    return output[0]

def save_audio(samples):
    sample_rate = 32000
    save_path = "audio_output/"
    os.makedirs(save_path, exist_ok=True)
    audio_path = f"(save_path)audio.wav"
    torch.aud.save(audio_path, samples, sample_rate)
    return audio_path

File Downloading:

Implement a helper function to allow users to download the generated audio file.

def get_binary_file_downloader_html(bin_file, file_label):
    with open(bin_file, "rb") as f:
        data = f.read()
    b64 = base64.b64encode(data).decode()
    href = f'<a href="data:application/octet-stream;base64,(b64)" download="(file_label)">Download your audio</a>'
    return href

Integrate Everything

Use the above functions in the main application logic, handling the user input and generating the appropriate output.

if description and duration:
    music_tensor = generate_music_tensors(description, duration)
    audio_file_path = save_audio(music_tensor)
    download_link = get_binary_file_downloader_html(audio_file_path, "Generated_Audio.wav")
    st.markdown(download_link, unsafe_allow_html=True)

Running the Application

Run your Streamlit application using the command:

streamlit run app.py

Conclusion

After implementing the above code, your application will be capable of generating music based on text prompts. With this functionality, you can experiment with various musical genres, styles, and prompts, giving rise to unique audio outputs.

Keyword

Music Generation
Generative AI
AudioCraft
MusicGen Model
Streamlit
Text Prompt
Audio Output

FAQ

Q: What is the MusicGen model?
A: MusicGen is an AI model developed by Meta that generates music from natural language descriptions.

Q: How do I run the application?
A: After creating the app.py file and adding the necessary code, run the command streamlit run app.py in your terminal.

Q: Can I customize the duration of the generated audio?
A: Yes, you can use the slider in the Streamlit app to select the audio duration between 2 and 20 seconds.

Q: What kind of music can I generate?
A: You can input any description or genre, and the MusicGen model will generate music based on your input.

Q: Is the generated music free to use?
A: The generated music can generally be used, but it's advisable to check the copyright guidelines associated with the MusicGen model and Meta’s policies.