When I create LLM applications, I start by using frontier models and no coding. It’s impressive to see what you can achieve with pure prompt engineering on GPT-4 or Claude 3. But once you get the LLM to do what you want, you need to optimize your application for scale, speed, and costs.
This is where you need techniques like retrieval augmentation (RAG) and LLM fine-tuning. However, these techniques often require coding and configurations that are difficult to understand.
MonsterGPT, a new tool by MonsterAPI, helps you fine-tune an LLM of your choice by chatting with ChatGPT. MonsterGPT can help you get through the tough configurations of creating a fine-tuned model by answering a few simple questions.
What you need for MonsterGPT
MonsterGPT is an AI assistant that uses the MonsterAPI cloud behind the scenes. MonsterAPI is a platform for using, hosting, and fine-tuning generative models. To use it you must have a MonsterAPI account. (You can sign up for a free account and get 2,500 credits.)
MonsterGPT itself is hosted on OpenAI’s GPT marketplace, so you also need a ChatGPT Plus subscription.Â
How to use MonsterGPT
Once you open MonsterGPT, you can just tell it which model you want to fine-tune. MonsterGPT supports most current open models, including Mistral, Mixtral, Llama-2 and 3, OpenELM, and Gemma (see full list here).
You must also specify the dataset that you want to fine-tune the model on. MonsterAPI supports all Hugging Face datasets. If you’re not sure about the dataset, you can ask the assistant for guidance, it will find Hugging Face datasets for you and provide you with their details.
Once you confirm the details, MonsterGPT launches a fine-tuning job on MonsterAPI. You can track progress on you MonsterAPI account or ask MonsterGPT to give you updates.
How to use your fine-tuned model
Once your model is fine-tuned you can run serve it on the MonsterAPI, download the weights and use it on your own server. Or you can ask MonsterGPT to launch it for you. You can then use the model through API access.
You can find more details on MonsterGPT here. See a demo of it in action below:
A few tips on LLM fine-tuning
When creating LLM applications, you often find yourself creating multi-step workflows. For example, first, you ask the model to classify the user’s prompt into one of several categories. Then based on the category, you route the request into different prompts. You might use other advanced techniques, such as extracting entity information from the prompt or matching different parts of the model with external tools.
It is worth reviewing each of these steps and consider whether you can replace a frontier model like GPT-4 with a smaller model that has been fine-tuned for that task. This can reduce your costs as you replace in-context learning with a fine-tuned model, and it might also increase the speed of your pipeline.
To make your fine-tuning process easier, start collecting data early in the development of your LLM application. For each step of your pipeline, create a dataset of prompt and responses (considering the data sensitivity and privacy concerns of your application). When you’re ready to scale the application, you can use that dataset to fine-tune a model.