This week, I want to shine a spotlight on an exceptional article by Jon Ander from BBVA. It provides insightful lessons learned in the process of building GenAI applications—essential reading for anyone advocating for GenAI within their companies. My reading also delves into the fascinating realm of AI and open-source advancements, analyzing the challenges and potential of open releases like Gemma 3 and OLMo 2 32B, examining the market’s trepidation regarding AI bubble comparisons, and exploring the innovative potential of agentic search. Additionally, it includes reflections on philosophical insights about the necessity of hard work for genuine fulfillment.

Online Marketplaces

  • Is Agentic Search the Portal Slaying Paradigm Shift We’ve Been Expecting for 25 Years?: Edmund Keith’s article discusses “agentic search,” a new AI-driven approach that challenges traditional real estate portals by allowing users to interact directly with agent websites. This shift, exemplified by OpenAI’s Operator, lets users execute complex tasks like booking property viewings, potentially bypassing portals. While agentic search represents a significant threat to established real estate models, its adoption depends on consumer trust and the complexity of tasks AI can handle. Partnerships with AI developers may offer protection, as seen with Booking.com, which collaborates with OpenAI to maintain its competitive edge. Despite AI’s growing influence, the completeness and convenience of information on portals remain vital in maintaining user engagement.

AI

  • BBVA’s GenAI Journey: When Everything You Know About AI Projects Gets Turned Up to Eleven: BBVA’s journey with Generative AI (GenAI) has dramatically impacted their AI product development, pushing existing knowledge “up to eleven” akin to Spinal Tap’s amplifier. While GenAI simplifies certain aspects like eliminating the need for training data, it introduces new complexities such as the need for effective knowledge base management, system integration, and novel evaluation frameworks. BBVA, recognizing the transformative potential of language technology in banking, established dedicated teams to harness GenAI, leading to over 35 projects and 800+ prototypes. However, they’ve faced challenges such as ensuring system knowledge bases are up-to-date, managing project risks, and addressing latency issues. Rapidly shifting tech landscapes prompt constant reassessment of their unique value propositions. Despite GenAI’s potential, BBVA learned that successful implementation requires adherence to foundational AI development principles. Their experience underscores the necessity of strategic decision-making and adapting to a constantly evolving technological environment.
  • How to Use Multimodal Embeddings to Create Semantic Search Engines for Multimedia: Adrian Araya’s article discusses the use of multimodal embeddings to build semantic search engines for multimedia, enabling machines to understand different types of data similarly to human semantics. Embeddings transform data into dense vector representations, allowing semantic searches across various modalities like text, image, and audio. The article outlines the development of a multimodal search engine that processes videos to identify segments related to user queries, using technologies like ImageBind for embeddings and ChromaDB for data storage and management. The search engine’s architecture and functionalities, such as caching and parameter adjustments, are designed to find segments based on the semantic similarities of different data types.
  • Accenture Invests in and Collaborates With AI-Powered Agentic Prediction Engine Aaru: Accenture Ventures has announced its investment in Aaru, an AI-driven company specializing in agentic prediction engines that simulate consumer behavior to enhance customer experiences and market responsiveness. Aaru’s technology leverages multi-agent AI systems and diverse data sources to offer precise behavioral predictions faster than traditional methods. Accenture Song plans to incorporate Aaru’s flagship model, Lumen, into its AI products across various business functions. This collaboration aims to bolster Aaru’s growth and expand its AI capabilities amid rising demand. Aaru joins Accenture Ventures’ Project Spotlight, underscoring Accenture’s ongoing investment in disruptive AI enterprise technologies.
  • Spain to Impose Massive Fines for Not Labelling AI-generated Content: Spain’s government has approved a bill to impose massive fines on companies using AI-generated content without proper labeling, aiming to address concerns over “deepfakes.” Aligning with the EU’s AI Act, the bill mandates transparency for high-risk AI systems. It classifies improper labeling as a “serious offence” with penalties up to 35 million euros or 7% of global turnover. The bill also bans subliminal techniques to manipulate vulnerable groups and creates a new AI supervisory agency, AESIA, to ensure compliance, except in areas like data privacy and finance, managed by specific watchdogs.
  • Gemma 3, OLMo 2 32B, and the Growing Potential of Open-Source AI: The article by Nathan Lambert discusses the advancements and challenges in open-source AI, focusing on the recent releases of Gemma 3 and OLMo 2 32B models. Despite the hype around open-source AI, truly open releases face significant challenges, including legal risks and resource constraints, often spearheaded by non-profits and academia. The OLMo 2 32B model surpasses previous iterations and competes with leading closed models, marking a major step towards accessible, high-performance AI. Meanwhile, Google’s Gemma models highlight innovations in input capabilities and model distillation. The open-source AI ecosystem is increasingly bridging the gap with closed models, driven by demand for transparent, customizable, and privacy-conscious solutions. This progress suggests a potential turning point for open-source AI, promising significant advancements in transparency, performance, and application deployment.
  • Why AI Spending Reminds Jim Chanos of the Fracking Bubble: Paul Krugman discusses the parallels between AI spending and the fracking bubble, drawing insights from investor Jim Chanos. He highlights concerns about overvaluations and rising risks in today’s market, reminiscent of past financial bubbles. Chanos believes that AI, unlike previous tech booms, demands enormous capital investment, leading to uncertain returns similar to fracking’s initial promise and subsequent profitability issues. With tech companies like Microsoft reconsidering AI investments, the analogy underscores potential market volatility rooted in political and economic factors.
  • Welcome Gemma 3: Google’s All New Multimodal, Multilingual, Long Context Open LLM: Google has launched Gemma 3, its latest multimodal, multilingual open-weight language model, available in sizes ranging from 1B to 27B parameters. Gemma 3 supports over 140 languages, processes both text and images, and features a context window up to 128k tokens. The model incorporates new techniques like SigLIP for image encoding and offers improved attention mechanisms for different inputs. It ranks among the top ten in LMSys with a notable Elo score of 1339, supporting Apple’s mlx-vlm for vision language models.
  • Mistral OCR: Mistral OCR is an advanced Optical Character Recognition API developed by mistral.ai, designed to set a new standard in document understanding. The API can extract and interpret complex document elements such as text, images, tables, and equations with unparalleled accuracy. It supports multilingual capabilities for diverse linguistic backgrounds and excels in performance, processing up to 2000 pages per minute. Mistral OCR is ideal for multimodal document analysis and offers a self-hosting option for privacy-sensitive organizations. Available as mistral-ocr-latest, it is accessible through various platforms and boasts better accuracy than its competitors in rigorous benchmarks.
  • OpenAI Agents SDK: Simon Willison’s blog discusses OpenAI’s latest Python library, the OpenAI Agents SDK, designed to build “agents” which replace the previous Swarm project. These agents are classes that configure a language model with a system prompt and access specific tools, featuring the concept of “handoffs” for transferring execution. The library also provides “guardrails” to filter user input, addressing AI security concerns. This represents a shift towards solving AI security issues internally with AI technology.

Philosophy

  • The Looking Glass: Our Souls Need Proof of Work: Julie Zhuo, in her article “The Looking Glass: Our Souls Need Proof of Work,” argues that hard work is essential for happiness and well-being, as it provides fulfillment and pride through personal effort. While modern conveniences promise comfort, they often dull our ability to find joy, creating a cycle of craving and dissatisfaction, exacerbated by dopamine’s role in seeking rewards. Zhuo highlights that embracing challenges, rather than avoiding discomfort, fosters resilience and authentic satisfaction, urging us to work hard for self-improvement and meaningful achievements.