rw-book-cover

Metadata

Highlights

  • Databricks’ decision to open source Unity Catalog and donate to LF AI & Data is great news for Lakehouse users. Providing a universal interface across data and AI, Unity Catalog ensures interoperability between many popular formats and compute engines. However, this announcement goes beyond simply providing new tools for the lakehouse; it’s also a big step toward the viability of open architecture. (View Highlight)
  • Flexibility, free from vendor lock-in: Many lakehouse data governance solutions tie users to specific vendors or platforms, limiting flexibility and control over their data. With Unity Catalog, you can own your data and metadata, giving you the freedom to choose the optimal solution for yourself without being confined to a single vendor. This open approach ensures that organizations can always stay ahead of their competition without being hindered by proprietary systems. (View Highlight)
  • Interoperability between formats and engines: Unity Catalog also delivers seamless interoperability between various data formats and compute engines. Whether it’s Delta Lake, Apache Iceberg, or Apache Hudi, Unity Catalog ensures that data can be easily read and managed across different systems. This capability is crucial for modern data architectures, especially with the varied nature of AI applications, where diverse data must be integrated and analyzed collectively with multiple engines on top. This interoperability ensures a consistent user experience and simplifies integrations between different systems, which saves labor hours while enabling you to build faster. (View Highlight)
  • While the benefits sound promising, what does this look like in practice? This section will use StarRocks, an open-source query engine that supports Delta through Delta Kernel Java , to demonstrate how we can easily interpolate between different table formats using Delta UniForm and Unity Catalog. (View Highlight)
  • Finally, we can create an REST external catalog to read the Iceberg tables. With Delta UniForm, when data is written, it automatically conforms to the Iceberg metadata standards alongside Delta and Apache Hudi, enabling any client in the Iceberg ecosystem to read the data as Iceberg. As a result, any Iceberg-compatible client can directly access and read the table. (View Highlight)
  • While the release of Unity Catalog has many implications, its impact on the compute engine landscape is worth calling out specifically: (View Highlight)
  • More competition and innovation: Open-sourcing Unity Catalog dramatically reduces user friction when moving between solutions. This, in turn, will increase competition and create opportunities for new challengers. Ultimately, you can expect to see more options in this space, which will translate to more choices for you thus lowering costs. (View Highlight)
  • Greater specialization of compute engines: Unity Catalog enables interoperability between lakehouse formats and compute engines, leading to more compute engines coexisting on a single source of truth data. Specialized compute engines that excel in specific tasks like batch processing or low-latency queries will more easily find their niche, and allow organizations to adopt technologies that are more finely tuned to their operational needs. This will be a big step up for your efficiency and performance. • A Stronger open source foundation: The significance of open-sourcing Unity Catalog ties directly into the broader narrative against vendor lock-in, offering users freedom in their choice of technologies. Strong open-source query engines ensure the ecosystem remains vibrant and complete, providing a comprehensive suite of tools that meet your needs without forcing you into a single vendor’s solution. (View Highlight)