Skip to content

New data architectures and AI

I have been thinking about how we manage data for a long time. And now with the rise of AI in peoples’ imaginations it is becoming an imperative for organisations to rethink their old model of storing data in siloes. Now data is not merely a thing to be stored, it now must be able to flow. It needs to serve several purposes from a single stream:

  1. Traditional business intelligence, analytics, and reporting
  2. The new kids on the block – AI and ML
  3. Services – being driven by low code application frameworks (things like Microsoft Power Apps, Appian, Service Now, among others)
  4. Tradtional data integration

And many of us have not really thought beyond our traditional operational data stores and our old fashioned data warehouses. But if we want to be ready to leverage the power of AI,

Many activists, such as Zhamak Dehghani, have long advocated a shift to a more decentralised way of thinking about data. As she says:

“Data mesh addresses these dimensions, founded in four principles: domain-oriented decentralized data ownership and architecture, data as a product, self-serve data infrastructure as a platform, and federated computational governance. Each principle drives a new logical view of the technical architecture and organizational structure.”

https://martinfowler.com/articles/data-mesh-principles.html#TheGreatDivideOfData

There is a great introduction to Data Mesh by Confluent, the folks behind Kafka, https://developer.confluent.io/courses/data-mesh/data-as-a-product/

But, in reality I think that Bill Inmon’s (aka father of the data warehouse) latest idea of the data lakehouse is the best transitional architecture to support ongoing operaitions. It is worth going to check out this video of him talking about the idea here.

The reason I say “transitional” is because, with AI moving so quickly, I do not believe anyone can know what our data structures are going to look like in two years. Everything is moving so fast yet I feel that tha data lakehouse is a sensible move towards the emerging future (which is probably going to be some kind of data fabric). One of the key benefits of the lakehouse architecture is that it enables the four use cases listed above of the same set of pipelines. Which is a win I am willing to take.