rw-book-cover

Metadata

Highlights

  • Oscar is a project aiming to improve open-source software development by creating automated help, or “agents,” for open-source maintenance. We believe there are many opportunities to reduce the amount of toil involved with maintaining open-source projects both large and small. (View Highlight)
  • The ability of large language models (LLMs) to do semantic analysis of natural language (such as issue reports or maintainer instructions) and to convert between natural language instructions and program code creates new opportunities for agents to interact more smoothly with people. LLMs will likely end up being only a small (but critical!) part of the picture; the bulk of an agent’s actions will be executing standard, deterministic code. (View Highlight)
  • Oscar differs from many development-focused uses of LLMs by not trying to augment or displace the code writing process at all. After all, writing code is the fun part of writing software. Instead, the idea is to focus on the not-fun parts, like processing incoming issues, matching questions to existing documentation, and so on. (View Highlight)
  • Oscar is very much an experiment. We don’t know yet where it will go or what we will learn. Even so, our first prototype, the @gabyhelp bot, has already had many successful interactions in the Go issue tracker. (View Highlight)
  • For now, Oscar is being developed under the auspices of the Go project. At some point in the future it may (or may not) be spun out into a separate project. (View Highlight)
  • The concrete goals for the Oscar project are: • Reduce maintainer effort to resolve issues [note that resolve does not always mean fix] • Reduce maintainer effort to resolve change lists (CLs) or pull requests (PRs) [note that resolve does not always mean submit/merge] • Reduce maintainer effort to resolve forum questions • Enable more people to become productive maintainers (View Highlight)
  • It is a non-goal to automate away coding. Instead we are focused on automating away maintainer toil. (View Highlight)
  • Maintainer toil is not unique to the Go project, so we are aiming to build an architecture that any software project can reuse and extend, building their own agents customized to their project’s needs. Hence Oscar: open-source contributor agent architecture. Exactly what that will mean is still something we are exploring. (View Highlight)
  • So far, we have identified three capabilities that will be an important part of Oscar:
    1. Indexing and surfacing related project context during contributor interactions.
    2. Using natural language to control deterministic tools.
    3. Analyzing issue reports and CLs/PRs, to help improve them in real time during or shortly after submission, and to label and route them appropriately. (View Highlight)
  • It should make sense that LLMs have something to offer here, because open-source maintenance is fundamentally about interacting with people using natural language, and natural language is what LLMs are best at. So it‘s not surprising that all of these have an LLM-related component. On the other hand, all of these are also backed by significant amounts of deterministic code. Our approach is to use LLMs for what they’re good at—semantic analysis of natural language and translation from natural language into programs—and rely on deterministic code to do the rest. (View Highlight)
  • Software projects are complex beasts. Only at the very beginning can a maintainer expect to keep all the important details and context in their head, and even when that‘s possible, those being in one person’s head does not help when a new contributor arrives with a bug report, a feature request, or a question. To address this, maintainers write design documentation, API references, FAQs, manual pages, blog posts, and so on. Now, instead of providing context directly, a maintainer can provide links to written context that already exists. Serving as a project search engine is still not the best use of the maintainer’s time. Once a project grows even to modest size, any single maintainer cannot keep track of all the context that might be relevant, making it even harder to serve as a project search engine. (View Highlight)
  • Combined with a vector database to retrieve vectors similar to an input vector, LLM embeddings provide a very effective way to index all of an open-source project’s context, including documentation, issue reports, and CLs/PRs, and forum discussions. When a new issue report arrives, an agent can use the LLM-based project context index to identify highly related context, such as similar previous issues or relevant project documentation. (View Highlight)
  • The agent surfaces related context to contributors. It is common for new issue reports to duplicate existing issue reports: a new bug might be reported multiple times in a short time window, or a non-bug might be reported every few months. When an agent replies with a link to a duplicate report, the contributor can close their new report and then watch that earlier issue. When an agent replies with a link to a report that looks like a duplicate but is not, the contributor can provide added context to distinguish their report from the earlier one. (View Highlight)
  • The agent surfaces related context even to project maintainers. Once a project reaches even modest size, no one person can remember all the context, not even a highly dedicated project maintainer. When an agent replies with a link to a related report, that eliminates the time the maintainer must spend to find it. If the maintainer has forgotten the related report entirely, or never saw it in the first place (perhaps it was handled by someone else), the reply is even more helpful, because it can point the maintainer in the right direction and save them the effort of repeating the analysis done in the earlier issue. (View Highlight)
  • The agent interacts with bug reporters immediately. In all of the previous examples, the fact that the agent replied only a minute or two after the report was filed meant that the reporter was still available and engaged enough to respond in a meaningful way: adding details to clarify the suggestion, closing the report as a duplicate, raising bug priority based on past reports, or identifying a fix. In contrast, if hours or days (or more) go by after the initial report, the original reporter may no longer be available, interested, or able to provide context or additional details. Immediately after the bug report is the best time to engage the reporter and refine the report. Maintainers cannot be expected to be engaged in this work all the time, but an agent can. (View Highlight)
  • Finally, note that surfacing project context is extensible, so that projects can incorporate their context no matter what form it takes. Our prototype’s context sources are tailored to the Go project, reading issues from GitHub, documentation from go.dev, and (soon) code reviews from Gerrit, but the architecture makes it easy to add additional sources. (View Highlight)
  • The second important agent capability is using natural language to control deterministic tooling. As open-source projects grow, the number of helpful tools increases, and it can be difficult to keep track of all of them and remember how to use each one. For example, our prototype includes a general facility for editing GitHub issue comments to add or fix links. We envision also adding facilities for adding labels to an issue or assigning or CC‘ing people when it matches certain criteria. If a maintainer does not know this functionality exists it might be difficult to find. And even if they know it exists, perhaps they aren’t familiar with the specific API and don’t want to take the time to learn it. (View Highlight)
  • On the other hand, LLMs are very good at translating between intentions written in natural language and executable forms of those intentions such as program code or tool invocations. We have done preliminary experiments with Gemini selecting from and invoking available tools to satisfy natural language requests made by a maintainer. We don’t have anything running for real yet, but it looks like a promising approach. (View Highlight)
  • A different approach would be to rely more heavily on LLMs, letting them edit code, issues, and so on entirely based on natural language prompts with no deterministic tools. This “magic wand” approach demands more of LLMs than they are capable of today. We believe it will be far more effective to use LLMs to convert from natural language to deterministic tool use once and then apply those deterministic tools automatically. Our approach also limits the amount of “LLM supervision” needed: a person can check that the tool invocation is correct and then rely on the tool to operate deterministically. (View Highlight)
  • The third important agent capability is analyzing issue reports and CLs/PRs (change lists / pull requests). Posting about related issues is a limited form of analysis, but we plan to add other kinds of semantic analysis, such as determining that an issue is primarily about performance and should have a “performance” label added. (View Highlight)
  • We also plan to explore whether it is possible to analyze reports well enough to identify whether more information is needed to make the report useful. For example, if a report does not include a link to a reproduction program on the Go playground, the agent could ask for one. And if there is such a link, the agent could make sure to inline the code into the report to make it self-contained. The agent could potentially also run a sandboxed execution tool to identify which Go releases contain the bug and even use git bisect to identify the commit that introduced the bug. (View Highlight)
  • As discussed earlier, all of these analyses and resulting interactions work much better when they happen immediately after the report is filed, when the reporter is still available and engaged. Automated agents can be on duty 24/7. (View Highlight)