Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    Let's build a Text to Music Generation App using Generative AI

    blog thumbnail

    Introduction

    In this guide, we will develop a text-to-music generation application using Meta's AudioCraft library, specifically leveraging the MusicGen model. This application will allow end users to input a text prompt, from which it will generate corresponding music. In this step-by-step tutorial, we will make use of Streamlit for a user-friendly interface while implementing various functions for model loading, music generation, audio saving, and file downloading. Let’s dive in!

    Prerequisites

    Before we get started, make sure you have the necessary libraries installed. Clone the AudioCraft GitHub repository and install the requirements.

    git clone https://github.com/facebookresearch/audiocraft.git
    cd audiocraft
    pip install -e .
    

    Note: It's recommended to check the dependencies and install them carefully, especially if you're using Python 3.8 or higher.

    Setting Up the Project

    1. Open your VS Code and create a new file named app.py.

    2. Import Necessary Libraries:

      import streamlit as st
      import os
      import torch
      import numpy as np
      import base64
      from audiocraft.models import music_gen
      
    3. Load the MusicGen Model:

      Create a function to load the pre-trained MusicGen model.

      @st.cache_resource
      def load_model():
          model = music_gen.from_pretrained("facebook/musicgen-small")
          return model
      

    Creating the Streamlit Interface

    1. Set Up the Streamlit App:

      Define the layout and page configuration for the application.

      st.set_page_config(page_title="Music Gen", page_icon="?")
      st.title("Text to Music Generation")
      
      with st.expander("See Explanation"):
          st.write("""
          This app is a music generation application built using Meta's AudioCraft library and
          it can generate music based on your natural language description.
          """)
      
    2. Get User Input:

      Add a text area for user prompts and a slider to select the audio duration.

      description = st.text_area("Enter your description:")
      duration = st.slider("Select time duration (seconds)", 2, 20, 5)
      

    Implement Music Generation Functionality

    1. Generate Music from Text:

      Create functions to generate music based on user input.

      def generate_music_tensors(description, duration):
          model = load_model()
          generation_params = (
              "use_sampling": True,
              "top_k": 50,
              "duration": duration
          )
          output = model.generate([description], **generation_params)
          return output[0]
      
      def save_audio(samples):
          sample_rate = 32000
          save_path = "audio_output/"
          os.makedirs(save_path, exist_ok=True)
          audio_path = f"(save_path)audio.wav"
          torch.aud.save(audio_path, samples, sample_rate)
          return audio_path
      
    2. File Downloading:

      Implement a helper function to allow users to download the generated audio file.

      def get_binary_file_downloader_html(bin_file, file_label):
          with open(bin_file, "rb") as f:
              data = f.read()
          b64 = base64.b64encode(data).decode()
          href = f'<a href="data:application/octet-stream;base64,(b64)" download="(file_label)">Download your audio</a>'
          return href
      

    Integrate Everything

    Use the above functions in the main application logic, handling the user input and generating the appropriate output.

    if description and duration:
        music_tensor = generate_music_tensors(description, duration)
        audio_file_path = save_audio(music_tensor)
        download_link = get_binary_file_downloader_html(audio_file_path, "Generated_Audio.wav")
        st.markdown(download_link, unsafe_allow_html=True)
    

    Running the Application

    Run your Streamlit application using the command:

    streamlit run app.py
    

    Conclusion

    After implementing the above code, your application will be capable of generating music based on text prompts. With this functionality, you can experiment with various musical genres, styles, and prompts, giving rise to unique audio outputs.

    Keyword

    • Music Generation
    • Generative AI
    • AudioCraft
    • MusicGen Model
    • Streamlit
    • Text Prompt
    • Audio Output

    FAQ

    Q: What is the MusicGen model?
    A: MusicGen is an AI model developed by Meta that generates music from natural language descriptions.

    Q: How do I run the application?
    A: After creating the app.py file and adding the necessary code, run the command streamlit run app.py in your terminal.

    Q: Can I customize the duration of the generated audio?
    A: Yes, you can use the slider in the Streamlit app to select the audio duration between 2 and 20 seconds.

    Q: What kind of music can I generate?
    A: You can input any description or genre, and the MusicGen model will generate music based on your input.

    Q: Is the generated music free to use?
    A: The generated music can generally be used, but it's advisable to check the copyright guidelines associated with the MusicGen model and Meta’s policies.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like