Hugging Face + Langchain in 5 mins | Access 200k+ FREE AI models for your AI apps

If you're diving into the world of AI apps, knowing how to use Hugging Face can be a game-changer. Hugging Face, valued at over $ 2 billion, is one of the top AI companies with some products integrated by tech giants like Google, Amazon, Microsoft, and Meta. It boasts 16,000 followers on GitHub and houses more than 200,000 different AI models including image-to-text, text-to-speech, and text-to-image models.

Introduction to Hugging Face

Hugging Face is your go-to platform for discovering and sharing AI models. It consists of three major components:

Models - Here, you can find a variety of AI models.
Datasets - This is where you find datasets to train your own models.
Spaces - A place designed for showcasing and sharing AI apps.

Models

To use a model, navigate to the Models section. For example, to use an image-to-text model, select the category and preview/test the model directly on their hosted version. This shortcut saves you from downloading, hosting, and running models locally to see if they suit your needs.

Datasets and Spaces

Datasets are primarily for training models. Space allows users to showcase their AI applications, explore others' work, and see the code and models used in these applications.

Implementing AI Apps with Hugging Face and Langchain

Let's build an AI app that converts an uploaded image into an audio story. This app involves three key components:

Image-to-Text model: To understand the scenario from the image.
Large Language Model: To generate a short story from the scenario text.
Text-to-Speech model: To convert the generated story into audio.

Step-by-Step Implementation

Getting Started with Hugging Face:
- Create a Hugging Face account.
- Generate an access token from settings.
- Set up the environment and import necessary libraries in Visual Studio.
Implement Image-to-Text Model:
- Use the BLIP model.
- Create a pipeline to load the AI model.
- Test the model by uploading an image and printing the generated text.
Generate Story with Langchain:
- Use GPT-3.5-turbo for generating stories.
- Create a prompt template.
- Use Langchain to convert the output scenario text into a story.
Convert Text-to-Speech:
- Use Hugging Face's inference API for text-to-speech conversion.
- Store the resulting audio file locally and play it.
Create a User Interface:
- Use Streamlit to build a front-end where users can upload images, see the scenario description, read the generated story, and listen to the audio.

Example Code

## Introduction
import streamlit as st
import requests
from transformers import pipeline
import openai

## Introduction
import os
HUGGINGFACE_API_TOKEN = os.getenv("HUGGINGFACE_API_TOKEN")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## Introduction
def image_to_text(image_path):
    image_to_text_pipe = pipeline(task="image-to-text", model="your-model-name")
    result = image_to_text_pipe(image_path)[0]
    return result['generated_text']

def generate_story(scenario_text):
    openai.api_key = OPENAI_API_KEY
    prompt = f"Generate a short story based on the scenario: (scenario_text)"
    response = openai.Completion.create(
        engine="davinci",
        prompt=prompt,
        max_tokens=100
    )
    return response.choices[0].text.strip()

def text_to_speech(story_text):
    url = "https://api-inference.huggingface.co/models/your-model-name"
    headers = ("Authorization": f"Bearer {HUGGINGFACE_API_TOKEN)"}
    payload = ("inputs": story_text)
    response = requests.post(url, headers=headers, json=payload)
    with open("audio_output.flac", "wb") as audio_file:
        audio_file.write(response.content)
    return "audio_output.flac"

## Introduction
st.title("Turn Image into Audio Story")
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])

if uploaded_file:
    st.image(uploaded_file, caption="Uploaded Image")
    scenario_text = image_to_text(uploaded_file)
    story_text = generate_story(scenario_text)
    audio_file = text_to_speech(story_text)
    
    st.text(f"Scenario: (scenario_text)")
    st.text(f"Story: (story_text)")
    st.audio(audio_file, format="audio/flac")

Conclusion

Hugging Face simplifies the integration of AI models into your applications by providing easy-to-use platforms and APIs. Whether you prefer to use their hosted models or download them locally, Hugging Face offers a robust toolkit for AI developers.

Keywords

AI Models
Hugging Face
Langchain
Streamlit
Image-to-Text
Text-to-Speech
GPT-3.5
Inference API
Datasets
Spaces

FAQ

Q1: What is Hugging Face? A: Hugging Face is an AI platform that offers over 200,000 different types of AI models covering tasks like image-to-text, text-to-speech, and more.

Q2: How can I access Hugging Face models? A: You can access models through Hugging Face’s hosted version, directly via their website, or by using APIs and downloading the models locally.

Q3: What libraries are needed to build an AI App using Hugging Face and Langchain? A: Essential libraries include Transformers, Streamlit, requests, and openai.

Q4: What is Langchain? A: Langchain is a library that enables the integration of various language models for tasks such as generating text, stories, or other language-related tasks.

Q5: How can I create a user interface for my AI app? A: Use Streamlit to create an interactive UI where users can upload files and see results like generated text or audio.