📢 Excited to release Gorilla🦍 Gorilla picks from 1000s of APIs to complete user tasks, surpassing even GPT-4! LLMs need to interact with the world through APIs, and Gorilla teaches LLMs APIs. Presenting Gorilla-Spotlight demo🤩
Webpage:
📢 Introducing Gorilla OpenFunctions! 🔥 We've listened to your calls for an open-source function calling model, and are thrilled to present Gorilla OpenFunctions 🦍 And yes, we've made parallel functions a reality in open-source! 😎
Curious about typical scenarios where GPT-4
📊Delighted to welcome Command-R-Plus, Llama-3, and and Gemini-Pro-1.5 into the Berkeley Function Calling Leaderboard. Check out how they stack up across different categories, P95 latency, and costs at
Congratulations to
@cohere
,
@AIatMeta
, and
📢Excited to release the live Berkeley Function-Calling Leaderboard! 🔥 Also debuting openfunctions-v2 🤩 the latest open-source SoTA function-calling model on-par with GPT-4🆕Native support for Javascript, Java, REST! 🫡
Leaderboard:
Blog:
📢Excited to release GoEx⚡️a runtime for LLM-generated actions like code, API calls, and more. Featuring "post-facto validation" for assessing LLM actions after execution 🔍 Key to our approach is "undo" 🔄 and "damage confinement" abstractions to manage unintended actions &
🌀Check out RAFT: Retrieval-Aware Fine Tuning! A simple technique to prepare data for fine-tuning LLMs for in-domain RAG, i.e., question-answering on your set of documents 📄 Exciting collaboration with
@berkeley_ai
🤝
@Azure
🤝
@AIatMeta
MSFT-Meta blog:
📢 Excited to release RAFT: Retriever-Aware FineTuning for domain-specific RAG, a three-way collaboration between
@berkeley_ai
@Azure
and
@AIatMeta
! 🤝
Drawing parallels between LLMs and students in open-book (RAG) 📔 and closed-book exams (SFT) 🧠, we present a better recipe
🦍Introducing the all-new gorilla-cli, now available as a pip package!✍️ With a vast collection of ~1500 🆕APIs, including 👀 Kubernetes, AWS, GCP, Azure, GitHub, Conda, Curl, Sed, and more🤩 simply state your goal, and let Gorilla CLI generate the commands for execution.
Train BERT on Smartphones 🤩 Announcing POET 📢 Find out how we train memory-hungry SOTA models on smartphones!
#ICML2022
Thu 21 Jul 3:45 pm EDT at Room 327 🧵👇
Paper:
Web:
🦍Big news! Gorilla is now Apache 2.0 licensed🤩We are delighted to welcome 2 new models into the Gorilla family ⛳️Use commercially with zero obligations! Our colab has been used for 6000+ invocations in the last week 🚀Check it out:
🙇♂️ Humbled to have
@AndrewYNg
talk about Gorilla 🦍 This is the most succinct write up on tool-use and function-calling. When we started Gorilla back in January of 2023, this was precisely our hypothesis 🙂
Tool use, in which an LLM is given functions it can request to call for gathering information, taking action, or manipulating data, is a key design pattern of AI agentic workflows. You may be familiar with LLM-based systems that can perform a web search or execute code. Some of
+ We are building Gorilla to be an LLM API appstore - you can add your APIs to Gorilla!
+ Github:
+ Join our Discord to stay in the loop!
+ Gorilla-Spotlight sign-up:
+ Fun collaboration with
@tianjun_zhang
,
@xinw_ai
and
@mejoeyg
Berkeley Function Calling Leaderboard: Introducing Consistent 8 X V100 with pay-as-you-go pricing for measuring costs and latency.
In depth: We fix inconsistency in the cost and latency calculation for open-source models, which are now all calculated when serving the model with
📊Delighted to welcome Command-R-Plus, Llama-3, and and Gemini-Pro-1.5 into the Berkeley Function Calling Leaderboard. Check out how they stack up across different categories, P95 latency, and costs at
Congratulations to
@cohere
,
@AIatMeta
, and
We study the inherent challenges in relying on LLMs—addressing their unpredictability, the essential trust mechanisms for their decision-making, and hurdles in failure recognition & resolution. Our system, GoEx presents abstractions and policies to overcome these for RESTful
API invocations are extremely brittle requiring LLMs to generate accurate input arguments. Gorilla improves accuracy while reducing hallucination, and generalizes to 1600+ APIs (and counting). 📢 Given recent debates, we also find Fine-tuning >> Prompting.
How to do better RAG? 🤔Check out in this webinar with
@jerryjliu0
on the shortcoming of today's RAG 👀and how a few simple tricks to create a fine-tuning data-set can vastly improve performance for in-domain RAG! And thanks to
@ravithejads
RAFT is now already part of
🎥 ICYMI: Check out the recording of our latest LlamaIndex Webinar on retrieval-augmented fine-tuning (RAFT)!
In the webinar,
@tianjun_zhang
and
@shishirpatil_
, the lead co-authors of RAFT, explain how combining an “open-book exam” approach (RAG) with a “closed-book exam”
🤩Check out our latest release MemGPT 🦙📚 Inspired by how OS manages pages, we explore - can the LLM manage it's own context length, and page-out / page-in from archival memory?
Introducing MemGPT 📚🦙 a method for extending LLM context windows. Inspired by OS mem management, it provides an infinite virtualized context for fixed-context LLMs. Enables perpetual chatbots & large doc QA. 🧵1/n
Paper:
GitHub:
Excited to see
@cohere
's new Command R+ model's focus on tool use capabilities!
Function Calling / Tool use has become a deciding factor in choosing models as we move beyond simple chatbots into integrating LLMs in workflows and agents. Check out our live Berkeley Function
R+ is quite performant. It's very competitive with models in its price range, and sometimes even those in a category above.
Tool Use is a place we've seen considerable improvement in R+ over R.
The last time
@profjoeyg
interviewed me, it was for PhD admissions to Berkeley.. Thankfully the stakes were lower this time 😜
Check out the latest interview where
@tianjun_zhang
and I dive into the details as we discuss
#GorillaLLM
and LLM APIs 🦍🦍🦍
Gorilla's a super cool LLM application: You can autogenerate API calls for things like Kubernetes or transformers. 🛠️
@profjoeyg
interviewed
@shishirpatil_
&
@tianjun_zhang
about how the model works:
Excited to welcome Snowflake-Arctic on the Berkeley Function Calling Leaderboard ❄️
How does Snowflake-arctic-instruct, an apache-2.0 licensed, 480B parameter MoE model perform on invoking functions (aka tools)? Attached is a quick comparison with gpt-4-0125-preview (yellow).
Our unique Abstract Syntax Tree (AST) evaluation, presents the first systematic evaluation of SOTA LLMs such as GPT-4, and Claude-v1. We benchmark them for Accuracy and Hallucination. Our experiments demonstrate what most people "felt", GPT-4 hallucinates more than GPT-3.5 👀
We have continued developing Gorilla: your go-to open-source API marketplace for LLMs 🦍Our colab has seen over 3k+ users in 3 days 🚀 A big thank you for your enthusiastic response 🫡 Gorilla is open source and will always be community driven.
⏲ Check out Gorilla in 60seconds:
📢It has been a thrilling 5 days since Gorilla release 🚀 Updates:
+ We released HF, TF and Torch Hub Gorilla zero-shot models
+ Opened up APIZoo for community contributions
+ 2k GitHub stars 🤩 and 780+ welcoming discord community 🎙️
Now, to answer the most pressing question
Check out
@AIatMeta
's post on RAFT for better in-domain RAG. Fun-fact, you can access it through:
Meta and Microsoft: Divided by shareholders, united by Berkeley 😉
Gorilla’s retriever–aware training enables it to react to test time changes in the APIs. Gorilla is able to respond to model upgrades and changes in model registry at test-time.
More information in our paper:
🦍Week 3 🚀
📢 Introducing Gorilla-7b-hf-delta-v1, a major update turning user queries into insightful code suggestions!
👋Thrilled to rank
#2
on Hacker News again! A hearty welcome to those who found us there.
🤝We're easing API discovery by transforming simple user queries into
📢How can Gorilla generate meaningful code snippets? This has been one of the first asks from the community 🤝 Today we have a major release: gorilla-7b-hf-delta-v1, a new model that gives code suggestions based on the query 🚀 Give it a try and let us know what you think! Check
Here are six amazing insights from
@jobergum
and
@shishirpatil_
on how to effectively use finetuning to optimize your LLM apps over your data:
1️⃣ Fine-tuning is actually remarkably effective at internalizing knowledge: retrieval algorithms in comparison can be inaccurate.
Models are increasingly diverging in their areas of focus! This is exciting as we see a fragmentation in the use of function calling capabilities!
🤔The ability of a model to refuse to present an answer continues to remain challenging!!
Honored to share the stage with luminaries at the
@SimonsInstitute
LLM workshop last week. In my talk, I delved into how Gorilla 🦍 adapts to imprecise retrievers and introduced a novel methodology to measure hallucination in LLMs.
Watch the session here:
We're thrilled to announce that our hosted Gorilla models have successfully processed over👏50,000 👏 user 👏 requests 🙇♂️ We're incredibly grateful for your overwhelming response and this has given us immense confidence to tear the initial stable releases on Github:
🦍Gorilla
Efficient Information retrievers are the solution to equip language models with the most up-to-date knowledge. 🚀
📢Releasing Gorilla+Retrievers to help keep APIs updated! We provide support for BM25 and GPT based retrievers in our GitHub:
🦍 Gorilla v0.0.1:
Check out the
@weaviate_io
paper summary on
#GorillaLLM
🦍
@CShorten30
does a phenomenal job on diving deep into the core research and presents lots of useful advise to those working on retrievers, LLMs, or APIs 🤩
I am SUPER excited to present a paper summary video of Gorilla: Large Language Models Connected to Massive APIs! 🦍🛠️
API instruction following, Self-Instruct, Retrieval-Aware Training, and more! One of the most exciting papers I've seen recently! 🤯🚀
Moving data between cloud object stores is a growing phenomenon. Our work presents the first solution in optimizing data movement across AWS (S3 buckets), Azure (Blobs) and Google Cloud (Storage).
Releasing Skyplane, a new open-source tool to move huge datasets between clouds.
Skyplane is:
1. 🔥 Blazing fast (110x faster)
2. 🤑 Cheap (4x cheaper)
3. 🌐 Universal (AWS, Azure and GCP)
Read more:
1/
I'm always amazed by the energy
@CShorten30
injects into his podcast! Had the privilege to be a part of it 🤩 Don't miss the latest episode of Weaviate 🤝 Gorilla podcast!
Beyond excited to publish our newest Weaviate Podcast with
@shishirpatil_
and
@tianjun_zhang
, co-authors of the Gorilla LLMs!! 🦍🛠️🎉
Loved this discussion on all things Gorilla, from the APIZoo to Self-Instruct, Retrieval-Aware Training, and more! 📚
Gorilla: Large Language Model Connected with Massive APIs
Releases Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls.
proj:
abs:
@AlphaSignalAI
Thanks for featuring our work! Gorilla 🦍 is an open source project from UC Berkeley with some exciting new releases planned! Stay tuned! 😉
In today's updates to the Berkeley Function Calling leaderboard:
📊Enhanced Leaderboard with Additional Models and Summary Table:
@MistralAI
-large-2402,
@GoogleAI
Gemini 1.0 Pro, and Gemma now included.
🤖 Gradio for Interactive Exploration! Includes function calling demos, and
Frontier level Tool Calling now live on
@GroqInc
powered by Llama 3 🫡
Outperforms GPT-4 Turbo 2024-04-09 and Claude 3 Opus (FC version) in multiple subcategories
At 300 tokens/s 🚀
I've personally been working on this feature, and man, the new Llama is good!
Delighting to see Gorilla's newest infant 🦍 Weaviate Gorilla 😊Checkout
@CShorten30
's blog and video on how you can use LLMs to invoke GraphQL APIs 🛠️🚀
Blog:
Youtube:
Fine-tune smaller models or switch to fine-tuning 3.5 from
#OpenAI
? Exciting experiment by
@morgymcg
on the
#GorillaLLM
dataset. Watch along and track the progress in real time 🤩
Put together a quick colab to fine-tune
@OpenAI
ChatGPT-3.5 on the huggingface api code from the gorilla dataset
Idea being to see if something like this can help improve ChatGPT-3.5's use of tools and mimic GPT-4's `functions` capability
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Presents a runtime for LLMs with an intuitive undo and damage confinement abstractions, enabling the safer deployment of LLM agents in practice
repo:
abs:
Using LLMs to call functions is 🚀🚀🚀 When we released OpenFunctions-v1 the community loved the models 🙏 but we quickly realized evaluation needed attention 👀 It took a while, but we are delighted to share the Berkeley Function-Calling Leaderboard.
+ Abstract Syntax Tree (AST)
Preview of our initial Weaviate Gorilla release, GPT-4 is amazing at generating synthetic Weaviate database schemas! 🦍🦍🦍
Love the way this slide turned out haha, all sorts of applications for Weaviate! Just ask an LLM!
🤗🍱🐕💊🍿📱🏀🎶🎮👕⛈️🌍🎻🚗
I am excited to announce that two of the LLMs from my group (Gorilla and Vicuna) are on AI Business’s top 12 models. Congrats
@shishirpatil_
,
@tianjun_zhang
,
@xinw_ai
, and the
@lmsysorg
team. We are looking forward to working with
@Meta
on Llama-v2 versions.
Super happy that Weaviate Gorilla was featured in
@1littlecoder
's AI Updates series!!
Thank you so much and massive kudos on the series - really impressive diversity! One of the best AI / ML news shows out! 🔥🙏
🌐 Pioneering a future where LLMs empower microservices & apps, evolving from mere data retrievers 🧵to autonomous decision-makers within our digital world 🧙 Wondering about the safety and correctness of these interactions🤔? Our latest vision paper explores these questions,
While one can further improve the throughput with optimizations, and of-course costs vary with the contracts and vintage of GPUs, that is besides the point. With cost and latency, our goal is to identify what the magnitude of costs and latency looks like, and understand the
Finetuning LLMs to call APIs
Present Gorilla, a finetuned LLaMA-based model that surpasses GPT-4 on writing API calls. This capability can help identify the right API, boosting the ability of LLMs to interact with external tools to complete specific tasks.
Huge potential for
A few (maybe obvious) challenges using LLMs within software applications I've seen as companies roll out their use:
- versioning : While it's possible to tie a program to a specific model version, there is no structured way to handle new model versions (e.g. deprecation, sub
🎁In OpenFunctions-v2, we natively train the model to support parallel functions (generate multiple functions at a time) and multiple functions (select one or more functions). Java/REST/Python APIs are also supported for the first time with extended data types📷 Looking forward
Introducing GRID: the General Robot Intelligence Development platform, designed for prototyping smart and safe robots rapidly using foundation models, LLMs, and simulation.
Paper:
Try now:
GitHub:
🧵👇(1/N)
How do you find the "right" loss for test-time adaptation to distribution shifts? Turns out we can use convex conjugates!
New paper📢 with
@Eric_jie_thu
, Aditi Raghunathan,
@zicokolter
.
Test-Time Adaptation via Conjugate Pseudo-labels
Had a great chat with
@chsrbrts
on
#GorillaLLM
How often do you think hyperscalers change their APIs? Spoiler: More than once a day!!! Check out the blog to learn more 🦍🦍🦍
We interview
@shishirpatil_
on Episode
#5
of Neural Notes about Gorilla, an LLM that generates API calls.🦍
Why does this matter? LLMs are well-suited for code generation bc the task is forgiving since there are multiple ways to write a function, but they struggle with API code
@jiayq
Agree on (1) and (2). Although for (3), from the set of our users, we actually see a healthy number of scenarios where you are choosing between a handful of functions. Especially in enterprise use-cases, it's either choose A/B/C or default D (usually support, doc q-a, etc)..
@intrstllrninja
Thanks for your kind words
@intrstllrninja
Eval Code:
Eval Dataset:
Let us know if you run into any issues, or as with any open-source feel free to raise a PR!
Our framework, Private Optimal Energy Training (POET) takes a memory and a run-time budget as input. POET schedules nodes of the training graph to satisfy the constraints by exploiting integrated rematerialization and paging. POET's MILP formulation is provably energy-optimal!
@Teknium1
@ivanfioravanti
@Teknium1
it's added with PR
This plot might be useful for understanding where different models shine!
@ivanfioravanti
depending on your use-case, your mileage may vary!
Of-course let us know if you have any comments, suggestions, find bugs, or
How are LoRAs and longer contexts for LLMs related? Check out
@xiuyu_l
and
@sijun_tan
's latest work on training LoRA adapters to support in-domain long-context 🗞️
Handling long context in LLMs is expensive, but can we cut the cost by learning them offline for a specific set/genre of documents?
Introducing LLoCO, our new technique that learns documents offline through context compression and in-domain finetuning using LoRA, which archives
@AustinNWharton
@marktenenholtz
Hey
@AustinNWharton
thanks for sharing your feedback! Gorilla CLI is fully open sourced, and commands are executed only with users explicit approval! Let us know if there is anything we can do that would make you feel more comfortable giving it a try!
@dotey
Hey
@dotey
👋 We do have some metrics in the paper, but would love it if you gave it a try and have any feedback for us on how we can improve it! 🙂
Which are the best ML discord servers to join? So many 😱 How to keep up with all of them? Which ones am I missing?
LAION
Eleuther
OS Shrunkworks AI (MoE project)
Hugging Face
CarperAI
Open Assistant
Harmonai (Dance Diffusion)
Replicate
Suno (bark, TTS)
Stable Foundation
Invoke
Fine-tuning models on the edge satisfies privacy constraints and enables offline operation. However, limited memory on edge makes training memory-hungry deep learning models infeasible.
@EugeneVinitsky
Yeah, and you want to make sure you power them sufficiently. If you don't give them enough juice, they'll be severely under-clocked even in headless mode.