rw-book-cover

Metadata

Highlights

  • Multilingual AI models often suffer uneven performance across languages, especially in multimodal tasks. A pair of lean models counters this trend with consistent understanding of text and images across major languages. What’s new: A team at Cohere led by Saurabh Dash released Aya Vision, a family of multilingual vision-language models with downloadable weights in 8 billion- and 32-billion-parameter sizes.  • Input/output: Text and images in (up to 2,197 image tokens, up to 16,000 tokens total), text out (up to 4,000 tokens). • Availability: Free via WhatsApp or Cohere Playground. Weights available to download, but licensed only for noncommercial uses. • Features: Multilingual input and output in 23 languages.  • Undisclosed: Knowledge cutoff, training datasets, adapter architecture. (View Highlight)
  • Each model comprises a pretrained large language model (Aya Expanse for the 32B model, C4AI Command R7B for the 8B version), a pretrained vision encoder (SigLIP 2), and a vision-language adapter (“connector”) of unspecified architecture. (View Highlight)
  • • To establish basic vision-language understanding, the team froze the vision encoder and language model and trained the vision-language connector. • They fine-tuned the vision-language connector and language model on multimodal tasks. To build the fine-tuning dataset, they generated synthetic annotations for various English-language datasets and translated a large amount of data into a variety of languages. They rephrased the translations to add fluency and variety, particularly for languages with little real-world data, by matching generated pairs with the original synthetic samples. • They merged the language model with the fine-tuned vision-language model using an undisclosed method that preserved text capabilities while adding vision understanding. • After proving this method for 8 billion parameters, they scaled up the recipe to 32 billion parameters. (View Highlight)
  • Multilingual vision-language models often perform less well in low-resource languages, and the gap widens when they process media other than text. Aya Vision’s recipe for augmenting synthetic data with successively refined translations may contribute to more universally capable models. Aya Vision is available on the global messaging platform WhatsApp, where it can be used to translate text and images in all 23 of its current languages. (View Highlight)
  • Multilingual vision models could soon help non-native speakers decipher Turkish road signs, Finnish legal contracts, and Korean receipts. We look forward to a world in which understanding any scene or document is as effortless in Swahili as it is in English. (View Highlight)
  • Google introduced AI co-scientist, a general multi-agent system designed to generate in-depth research proposals within constraints specified by the user. The team generated and evaluated proposals for repurposing drugs, identifying drug targets, and explaining antimicrobial resistance in real-world laboratories. It’s available to research organizations on a limited basis. (View Highlight)
  • AI co-scientist accepts a text description of a research goal, including relevant constraints or ideas. In response, it generates research proposals and reviews, ranks, and improves them using seven agents based on Google’s Gemini 2.0 family of large language models. The completed proposals include sections that explain background, unmet needs, a proposed solution, goals, hypotheses, reasoning, study steps, and relevant articles. (View Highlight)
  • The agents take feedback and outputs from other agents to perform their prompted task simultaneously. • The supervisor agent periodically determines how often to run the other six agents, how important their output is, and whether the system is finished. To accomplish this, it computes statistics that represent the number of proposals generated so far, how many have been reviewed, and so on. (View Highlight)
  • The generation agent generates a list of proposals. It searches the web for relevant research articles, identifies testable assumptions, and debates with itself to improve ambiguous statements and adhere to constraints. (View Highlight)
  • The reflection agent filters the generated proposals according to correctness, quality, safety, and novelty. First, it reviews a proposal without web search and discards obviously bad proposals. Then it reviews each proposal against literature it finds online. It breaks down and checks the proposal’s assumptions, checks whether the proposal might explain some observations in previous work, and simulates the proposed experiment (via text generation, similar to how a person performs a thought experiment). (View Highlight)
  • • The proximity agents compute similarity between proposals to avoid redundancy. • The ranking agent determines the best proposals according to a tournament. It examines one pair of proposals at a time (including reviews from the reflection agent) and debates itself to pick the better one. To save computation, it prioritizes comparing similar proposals, new proposals, and highest-ranking proposals. (View Highlight)
  • • The evolution agent generates new proposals by improving existing ones. It does this in several different ways, including simplifying current ideas, combining top-ranking ideas, and generating proposals that are very different from current ones. • The meta-review agent identifies common patterns in the reflection agent’s reviews and the ranking agent’s debates. Its feedback goes to the reflection and generation agents, which use it to address common factors in future reviews and avoid generating similar proposals, respectively. (View Highlight)
  • Google researchers generated proposals for experiments that would repurpose drugs to treat acute myeloid leukemia. They shared the 30 highest-ranked proposals with human experts, who chose five for lab tests. Of the five drugs tested, three killed acute myeloid leukemia cells. (View Highlight)
  • Experts selected three among 15 top-ranked generated proposals that proposed repurposing existing drugs to treat liver fibrosis. Two significantly inhibited liver fibrosis without being toxic to general cells. (Prior to this research, one of the drugs was approved by the United States Food and Drug Administration for a different illness, which may lead to a new treatment for liver fibrosis.) (View Highlight)
  • : A few AI systems have begun to produce original scientific work. For instance, a model generated research proposals that human judges deemed more novel than proposals written by flesh-and-blood scientists, and an agentic workflow produced research papers that met standards for acceptance by top conferences. (View Highlight)
  • While previous work used agentic workflows to propose research ideas on a general topic, this work generates proposals for specific ideas according to a researcher’s constraints (for example, a researcher could specify that a novel medical treatment for a specific disease only consider drugs already approved for human trials for other uses) and further instructions. AI co-scientist can take feedback at any point, allowing humans to collaborate with the machine: People provide ideas, feedback, and guidance for the model, and the model researches and proposes ideas in return. (View Highlight)
  • The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary. What’s new: AI-generated works qualify for copyright if a human being contributed enough creative input, according to the second part of what will be a three-part report on artificial intelligence and copyright law. How it works: The report states that “the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements.” In other words, humans and AI can collaborate on creative works, but copyright protection applies only if a human shapes the AI-generated material beyond simply supplying a prompt. (View Highlight)
  • The report rejects the argument that protecting AI-generated works requires a new legal framework. Instead, it argues that copyright law already establishes clear standards of authorship and originality. (View Highlight)
  • Human authors or artists retain copyright over creative contributions in the form of selection, coordination, and modification of generated outputs. Selection refers to curating AI-generated elements. Coordination involves organizing multiple generated outputs into a cohesive work. Modification is altering generated material in a way that makes it original. They retain copyright even if AI processes their creative work. They lose it only if the generated output is genuinely transformative. (View Highlight)
  • The Copyright Office deliberately avoided prescribing rigid criteria for the types or degrees of human input that are sufficient for copyright. Such determinations require nuanced evaluation case by case. This flexible approach accommodates the diverse ways creative people use AI as well as unforeseen creative possibilities of emerging technology. (View Highlight)