EASIEST Way to Fine-Tune a LLM and Use It With Ollama

Introduction

Fine-tuning a large language model (LLM) locally on your machine can be an exciting yet complex task. In this guide, we will walk through the process of creating a small and fast LLM that generates SQL queries from table data. We will focus on using the Synthetic Text to SQL dataset, which comprises over 105,000 records. Let’s break it down step-by-step, ensuring you have all the information you need to get started.

Finding the Right Dataset

Choosing the right dataset is crucial as it can drastically influence the performance of your fine-tuned LLM. When you provide relevant data, smaller models can often outperform larger counterparts. We will be utilizing the Synthetic Text to SQL dataset for this project.

Hardware Requirements

For this tutorial, I will be using an NVIDIA 4090 GPU on Ubuntu. If you lack a dedicated GPU, you can still run this project using Google Colab, which allows you to execute training code in the cloud.

Setting Up Your Environment

We will be using UNS Sloth, an efficient tool that simplifies the fine-tuning process of open-source models with reduced memory usage. Our model of choice will be Llama 3, a high-performance language model tailored for various applications.

Before starting, ensure you have the following installed on your machine:

Anaconda
CUDA libraries (preferably version 12.1)
Python (version 3.10)
Jupyter Notebook

To install the dependencies required by UNS Sloth, create a new Anaconda environment and install PyTorch, CUDA libraries, and the latest version of UNS Sloth. You might also want to install Jupyter if it isn’t already available.

conda create -n myenv python=3.10
conda activate myenv
conda install pytorch torchvision torchaudio cudatoolkit=12.1 -c pytorch
pip install uns-sloth
pip install jupyter

Now, you can launch Jupyter Notebook, and we are set up for the next steps!

Importing Required Packages

In Jupyter Notebook, verify that all installed requirements are present. If you're using Google Colab, run the following command to install necessary packages. Next, you will import the fast language model from UNS Sloth and configure it:

from uns_sloth import LlamaTokenModel

## Introduction
model = LlamaTokenModel("Llama_3_8bit", max_sequence_length=2048, load_in_4bit=True)

This setup will allow the model to manage memory more efficiently by utilizing only 4 bits for information representation.

Training Your Model

Once the model is loaded, we will add the PFT Model (Parameter-efficient Fine-Tuning Model), which allows for updating only 1-10% of model parameters rather than retraining the entire model.

Next, we will format our dataset to align with Alpaca prompts, which are needed for fine-tuning. Given our dataset’s format, we will customize our code to specifically extract the SQL command from the database, along with the necessary prompts and explanations.

Here's an example of what your code could look like for setting up supervised fine-tuning:

from huggingface.training import SupervisedTrainer

trainer = SupervisedTrainer(model, ...)
trainer.train()

Parameters such as max_steps, seed, and warmup_steps must be configured to manage your training process effectively.

Saving Your Model

Once training has completed, we need to convert the model into the correct file type for local execution with Ollama. Luckily, UNS Sloth provides a one-liner to help with this process.

uns_sloth export --model_path /path/to/model --output_file model_file

After this, create a model file specifying the parameters for Ollama. Example content may include:

## Introduction
prompt: "You’re an SQL generator that takes a user's query and provides helpful SQL."

Running Your Model Locally

To execute your model, ensure Ollama is running and execute the command in the terminal:

ollama run --model_file /path/to/model_file

With that, you have successfully fine-tuned your LLM locally and are now able to use it with the OpenAI-compatible API and integrate it into your applications.

Conclusion

Congratulations! You’ve now created, fine-tuned, and deployed your LLM locally with minimal resources. If you would like to learn more about Ollama or have any other questions, feel free to explore additional resources or leave comments!

Keywords

Fine-tuning
LLM
Dataset
SQL generation
NVIDIA 4090
UNS Sloth
Llama 3
Anaconda
Jupyter Notebook
Alpaca prompts

FAQ

1. What is UN Sloth?

UN Sloth is a tool designed to efficiently fine-tune open-source models while reducing memory usage and resource requirements.

2. Do I need a GPU to fine-tune a model?

While a dedicated GPU optimizes the training process, you can also use platforms like Google Colab, which provides cloud-based access to GPU resources.

3. How do I format my dataset for training?

The dataset needs to be formatted in a way that the LLM can process efficiently using Alpaca prompts. Make sure to define the input, expected SQL output, and additional context required for training.

4. What is the significance of using PFT Models?

Parameter-efficient fine-tuning (PFT) allows you to update only a small percentage of the model's parameters, resulting in considerable time and resource savings compared to complete retraining.

5. How can I run my model locally?

Ensure Ollama is running and use the command line interface to execute your model file in concert with your local setup.