OpenAI Licenses News Archives, Generative Coding From Plan to Pull Request, Recognizing Landmines, Streamlined Inference

rw-book-cover

Metadata

Author: The Batch @ DeepLearning.AI
Full Title: OpenAI Licenses News Archives, Generative Coding From Plan to Pull Request, Recognizing Landmines, Streamlined Inference

Highlights

For instance, about 12 months ago, the Center For AI Safety’s “Statement on AI Risk” warned that AI could cause human extinction and stoked fears of AI taking over. This alarmed leaders in Washington. But many people in AI pointed out that this dystopian science-fiction scenario has little basis in reality. About six months later, when I testified at the U.S. Senate’s AI Insight forum, legislators no longer worried much about an AI takeover. Then the opponents of open source shifted gears. Their leading argument shifted to the risk of AI helping to create bioweapons. Soon afterward, OpenAI and RAND showed that current AI does not significantly increase the ability of malefactors to build bioweapons. This fear of AI-enabled bioweapons has diminished. To be sure, the possibility that bad actors could use bioweapons — with or without AI — remains a topic of great international concern. (View Highlight)
The latest argument for blocking open source AI has shifted to national security. AI is useful for both economic competition and warfare, and open source opponents say the U.S. should make sure its adversaries don’t have access to the latest foundation models. While I don’t want authoritarian governments to use AI, particularly to wage unjust wars, the LLM cat is out of the bag, and authoritarian countries will fill the vacuum if democratic nations limit access. When, some day, a child somewhere asks an AI system questions about democracy, the role of a free press, or the function of an independent judiciary in preserving the rule of law, I would like the AI to reflect democratic values rather than favor authoritarian leaders’ goals over, say, human rights. (View Highlight)
What’s new: GitHub unveiled a preview of Copilot Workspace, a generative development environment that’s designed to encompass entire projects. (View Highlight)
How it works: Copilot Workspace is based on GPT-4 Turbo and integrated with GitHub code repositories and libraries. Where GitHub Copilot previously generated code snippets and provided suggestions for editing code segments, Copilot Workspace integrates these tasks within a larger plan. • Users begin by providing a known bug, feature request, or codebase and then prompting the system. For instance, a user can provide code for a simple Pong-style video game and request a feature, such as an automated opponent to play against. • Given the request, the system determines the current state of the codebase, then proposes goals the code will meet once the new feature has been implemented. For example, the system might propose, “the computer controls the left paddle automatically, allowing for a single-player game against the computer” and “the game mechanics and logic for the computer’s movement have been added to index.jsx.” • The goals function as a new prompt, spurring the system to plan intermediate steps to reach them. For instance, the revised plan might include, “add computer player logic for paddle 1 that blocks the ball 95% of the time” and “remove logic for player control of paddle 1.” • Users can edit all of this before telling the system to carry out the plan. Afterward, the resulting code can be edited, previewed, shared, and subjected to new tests. • Once the code has passed the tests, users can upload it directly to GitHub as a pull request or fork in the code repository or library. (View Highlight)
Yes, but: Initial users noted that Copilot Workspace is best at solving straightforward, well defined problems and struggles with more complex ones. Choices can be difficult to unwind later on, and the system is slower than simpler AI coding assistants. (View Highlight)
OpenAI has been making deals with publishers to gain access to high-quality training data. It added Financial Times to the list. What’s new: OpenAI licensed the archive of business news owned by Financial Times (FT) for an undisclosed sum. The agreement lets OpenAI train its models on the publisher’s articles and deliver information gleaned from them. This is OpenAI’s fifth such agreement with major news publishers in the past year. (View Highlight)
Archives of news articles may be handy if OpenAI proceeds with a rumored search service reported by in February by The Information. Licensing is a way to get such material that is unambiguously legal. Although AI researchers commonly scrape data from the web and use it for training models without obtaining licenses for copyrighted works, whether a license is required to train AI models on works under copyright in the U.S. has yet to be determined. Copyright owners lately have challenged this practice in court. In December 2023, The New York Times sued OpenAI and Microsoft, claiming that OpenAI infringed its copyrights by training models on its articles. In April 2024, eight U.S. newspapers owned by Alden Global Capital, a hedge fund, filed a lawsuit against the same defendants on similar grounds. Licensing material from publishers gives OpenAI access to their works while offering them incentives to negotiate rather than sue. (View Highlight)
Stanford, University of California San Diego, ETH Zürich, Adobe, Meta, and Carnegie Mellon proposed Deja Vu, an algorithm that accelerates inferencing of large language models (LLMs) by using small vanilla neural networks to predict which parts of it to use. Key insight: Transformer-based neural networks can save a lot of time at inference by activating only a fraction of (i) attention heads and (ii) neurons in fully connected layers. But it’s necessary to activate the right neurons, because different parts of the network learn about different patterns of inputs. By using the input to decide which parts of the network to activate, the network can maintain accuracy using only the parts relevant for the current input. (View Highlight)
The authors used pretrained OPT models of various sizes (175, 66, and 30 billion parameters). They built a dataset by feeding examples from OpenBookQA and Wiki-Text to the OPTs and recording the outputs of all attention heads and fully-connected-layer neurons. By activating various portions of these networks, they learned that, for a given input, they could discard most of an OPT’s lowest-output attention heads and fully-connected-layer neurons without degrading its performance. • The authors used their dataset to train a sparsity predictor for each of an OPT’s fully connected layers. This small vanilla neural network classified which neurons in a fully connected layer to activate (because they produced large outputs), given the output of the previous fully connected layer. • Using the same dataset, they trained, for each attention layer, a small vanilla neural network to classify which attention heads to activate (because they produced large outputs), given the output of the previous attention layer. • At inference, an OPT and its predictor networks ran in parallel. While the OPT computed an attention layer, a predictor network predicted the neurons to activate in the following fully connected layer. Similarly, while the OPT computed each fully connected layer, a predictor network predicted the heads to activate in the following attention layer. (View Highlight)

Pelayo Arbués

Explorer

Recent Notes

AI Learning Paths for Software Engineers Without Becoming a Data Scientist

Power and Prediction

Why Software Engineers Should Learn a Bit of Data Science

OpenAI Licenses News Archives, Generative Coding From Plan to Pull Request, Recognizing Landmines, Streamlined Inference

Metadata

Highlights

Graph View

Table of Contents

Now Reading

Scale ML Models to Billions of Parameters