Toolformer: Language Models Can Teach Themselves to Use Tools (View Highlight)
we show that
LMs can teach themselves to use external tools
via simple APIs (View Highlight)
. This is done in a self-supervised
way, requiring nothing more than a handful of
demonstrations for each API. We incorporate
a range of tools, including a calculator, a Q&A
system, a search engine, a translation system,
and a calendar. Toolformer achieves substan-
tially improved zero-shot performance across
a variety of downstream tasks, often competi-
tive with much larger models, without sacrific-
ing its core language modeling abilities. (View Highlight)
, existing ap-
proaches either rely on large amounts of human
annotations (Komeili et al., 2022; Thoppilan et al.,
2022) or limit tool use to task-specific settings only
(e.g., Gao et al., 2022; Parisi et al., 2022), (View Highlight)
limitations include an inability to access
up-to-date information on recent events (View Highlight)
difficul-
ties in understanding low-resource languages (View Highlight)
a lack of mathematical skills to per-
form precise calculations (View Highlight)
) and an
unawareness of the progression of time (View Highlight)
The use of tools should be learned in a
self-supervised way without requiring large
amounts of human annotations. (View Highlight)
The LM should not lose any of its generality
and should be able to decide for itself when
and how to use which tool. (View Highlight)
Our aim is to equip a language model M with the
ability to use different tools by means of API calls.
We require that inputs and outputs for each API
can be represented as text sequences. This allows
seamless insertion of API calls into any given text,
using special tokens to mark the start and end of
each such call. (View Highlight)
using large LMs with in-
context learning (Brown et al., 2020) to generate
entire datasets from scratch (View Highlight)
Given just a handful of human-written examples
of how an API can be used, we let a LM annotate
a huge language modeling dataset with potential
API calls. We then use a self-supervised loss to
determine which of these API calls actually help
the model in predicting future tokens. Finally, we
finetune the LM itself on the API calls that it con-
siders useful. (View Highlight)
As a next step, we execute
all API calls generated by M to obtain the corre-
sponding results. How this is done depends entirely
on the API itself – for example, it can involve call-
ing another neural network, executing a Python
script or using a retrieval system to perform search
over a large corpus (View Highlight)
For each API, we write a
prompt P(x) that encourages the LM to anno-
tate an example x = x1, … , xn with API calls. (View Highlight)
Model Finetuning After sampling and filtering
calls for all APIs, we finally merge the remaining
API calls and interleave them with the original
inputs (View Highlight)
One such limi-
tation is the inability of Toolformer to use tools in a
chain (i.e., using the output of one tool as an input
for another tool). This is due to the fact that API
calls for each tool are generated independently; as a
consequence, there are no examples of chained tool
use in the finetuning dataset (View Highlight)
we found models
trained with Toolformer to often be sensitive to the
exact wording of their input when deciding whether
or not to call an API; this is perhaps unsurprising
given that LMs are known to be very sensitive to
the prompt they are provided with in both zero-
and few-shot settings (View Highlight)