Topview Logo
  • Create viral videos with
    GPT-4o + Ads library
    Use GPT-4o to edit video empowered by Youtube & Tiktok & Facebook ads library. Turns your links or media assets into viral videos in one click.
    Try it free
    gpt video

    ChatGPT vs Llama - Which LLM writes better OpenAPI?

    blog thumbnail

    Introduction

    In the rapidly evolving world of artificial intelligence, large language models (LLMs) have proven their abilities to generate a variety of content, including code. We've seen notable examples of this functionality in GitHub Copilot and ChatGPT, but how adept are these models at designing an API? In this article, we pit two powerful LLMs against each other, ChatGPT 4 and Llama 3.1, to evaluate their proficiency in generating OpenAPI documents for a fictional link-shortening application called "Mini Links."

    The Task

    The prompt provided to both models was straightforward: create an API using the OpenAPI specification for Mini Links. The application should support creating, updating, listing, and deleting links, while also allowing for user management. The requirements included adherence to security standards, use of JSON schema, and provision of examples. Outputs were to be formatted as JSON.

    Round 1: ChatGPT 4

    ChatGPT 4 was the first contender to respond to the prompt. Its response was generated quickly, and the resultant OpenAPI document was saved for evaluation. Using the Rate My Open API tool, the document was scored, yielding an overall score of 55.

    Breakdown of ChatGPT's Score:

    • Documentation: 56
    • Completeness: 53
    • SDK Generation: 78
    • Security: 40

    Despite not hitting a high score, the OpenAPI document was functional and contained no severe errors. Following the initial scoring, feedback was provided to ChatGPT based on the results. It was tasked with improving its output using the critiques received. After applying the suggested changes, ChatGPT's updated API document achieved an impressive score of 96. However, it introduced some new errors while significantly improving on key elements.

    Round 2: Llama 3.1

    The spotlight then turned to Llama 3.1. The same prompt was used, and the response came back remarkably fast. The generated JSON was saved and submitted to Rate My Open API for evaluation. Unfortunately, Llama received a modest overall score of 47.

    Breakdown of Llama's Score:

    • Documentation: Low 50s
    • Completeness: Low
    • SDK Generation: 62
    • Security: 30

    The scoring indicated that Llama's initial document suffered from more substantial deficiencies compared to ChatGPT's. Similar to ChatGPT, Llama received feedback and was asked to incorporate the improvements suggested. However, once it resubmitted its revised document, it received an overall score of only 49, which was merely an incremental increase from its original score.

    Conclusion

    In this head-to-head comparison, it is evident that ChatGPT 4 outperformed Llama 3.1, demonstrating a superior ability to understand and implement OpenAPI specifications. ChatGPT not only improved significantly upon receiving feedback, but also provided a more complete and secure API design from the outset.

    While Llama shows promise, it will need further refinement to compete at this level. This competition serves as a compelling demonstration of the capabilities of LLMs in API design, and future contests will further explore these abilities across different models. Stick around for more evaluations of LLM performance in API design and other coding tasks, and consider trying out Rate My Open API to evaluate your own API documents.


    Keywords

    • OpenAPI
    • ChatGPT
    • Llama
    • API design
    • Rate My Open API
    • JSON schema
    • AI models
    • Performance evaluation

    FAQ

    Q: What was the task for the LLMs?
    A: The task was to design an OpenAPI document for a link-shortening application called Mini Links, which involved managing links and users.

    Q: How did ChatGPT perform?
    A: ChatGPT initially scored 55 but improved to 96 after applying feedback to its OpenAPI document.

    Q: What was Llama's initial score?
    A: Llama scored 47 on its first attempt and improved to only 49 after receiving feedback.

    Q: Which model outperformed the other?
    A: ChatGPT 4 outperformed Llama 3.1 in both initial scoring and improvement capability.

    Q: Where can I evaluate my own OpenAPI documents?
    A: You can evaluate your OpenAPI documents at Rate My Open API.

    One more thing

    In addition to the incredible tools mentioned above, for those looking to elevate their video creation process even further, Topview.ai stands out as a revolutionary online AI video editor.

    TopView.ai provides two powerful tools to help you make ads video in one click.

    Materials to Video: you can upload your raw footage or pictures, TopView.ai will edit video based on media you uploaded for you.

    Link to Video: you can paste an E-Commerce product link, TopView.ai will generate a video for you.

    You may also like