rw-book-cover

Metadata

Highlights

  • Data pipelines are commoditized and analytics engineers don’t provide enough value. (View Highlight)
  • I posted a Bluesky thread this weekend arguing that analytics engineers and data engineers should be folded back into a single role. I decided to make the argument after coming across (View Highlight)
  • In short, my answer is that analytics—not as an industry or as a technology ecosystem, but as a discipline—might not work. The average company may never be able to make better decisions by hiring a team of average analysts. We can make dashboards and be operational accountants. But the fun, exploratory, “valuable” work may always be an indulgent, empty dessert, and never the entrée we want it to be. — Benn Stancil, Disband the analytics team (View Highlight)
  • I’ve long held that creating the “analytics engineer” role was a mistake. dbtLabs says, “Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions.” I don’t believe that this set of activities is enough value to justify a full headcount; it’s is too limited in scope and too far removed from revenue generation. (View Highlight)
  • Extracting, transforming, and loading (ETL’ing) data used to be handled by one team: the data warehouse team. But several trends have encouraged a schism in warehouse teams. Some—data engineers—now work on data pipelines (extract and load) while others work on data marts (“clean data sets”, as dbtLabs calls them). (View Highlight)
  • In short, data engineers do the E and L, and analytics engineers do the T. Many trends contributed to this bifurcation. • We switched from ETL to ELT when we adopted data lake architectures. Dumping garbage into an object store made it easy for data engineers to ignore transformations and gave analytics engineers something to do. • Similarly, adopting data integration with Kafka and Kafka Connect greatly expanded the number (and importance) of data pipelines in an organization, which gave the data engineers something to do as well. • Shift-left became a data philosophy that encouraged everyone to be their own analyst, which left analysts squeezed. • ZIRP ended, which made CFOs take a hard look at the cost of analyst and data teams, which further squeezed analysts. • dbtLabs, Motherduck, and other MDS vendors were all too willing to create a new role to sell their products, which dovetailed nicely with analyst’s desire to be engineers and get paid more. • LLMs are replacing analysts in some cases. Screech all you want, but it’s happening. There are dozens of data chatbots now (Cimba.ai﹩, DataChat, Julius.ai), and LLMs write pretty good SQL. (View Highlight)
  • We’re in a new world now, though. ZIRP is gone, most of the connectors that data engineers were working on have now been built, and there are many vendors you can pay to run your data pipeline, and chatbots can answer data questions. It’s time to merge data engineers and analytics engineers back into a single data team that’s responsible for E, T, and L. (View Highlight)
  • I’m happy to see companies and projects showing up to ease this transition. The most notable one is dltHub, which adopts SDLC best practices for data pipelines much as dbt did for transformations. Tools such as this should make it easier for analytics engineers to take ownership of data pipelines. I’ve also seen several tools like tabsdata﹩ that merge ETL back into a single tool for analytics and data engineers, rather than having both dltHub and dbt. I expect to see a collapse of data engineering and analytics engineering back to a single team in the next few years. (View Highlight)