So, I’ve been kind of reluctant to dig into it until I started to read about Azure Event Hubs. The first thing that came up when starting to type it on Google was “Azure Event Hub vs Kafka”! So, being a person that wants to be up-to-date (well…) with Azure, I’ve now decided that I need to learn more about this.
I downloaded a book with the title Designing Event-Driven Systems from the prominent company Confluent. After reading it, I was really impressed! It confirms so many problems we have by building our systems as separate islands trying to communicate to each other. There were definitely a lot of new ideas of how to solve these problems!
What about the warnings?
Event streaming, as well as other event technologies or patterns as CQRS and Event sourcing, can add a lot of extra overhead if not used for an explicit purpose. You should consider the business you’re in, how they make money, weighing pros and cons. Bringing in Kafka to an organisation is a huge step, and it is important that you get return of investment.
But with that said, how could we know why we shouldn’t use it, if we don’t know what it its. So let’s get started!
What is Kafka?
Event streaming is the digital equivalent of the human body’s central nervous system. It is the technological foundation for the ‘always-on’ world where businesses are increasingly software-defined and automated, and where the user of software is more software.
Kafka is an event streaming platform, optimised for stream processing and high throughput. If you want to read the hard facts you can continue at Apache. If you want to know why it turns your view of software development upside down, preferably continue with this post!
Before I read the book, I thought of Kafka as an advanced service bus (ESB), but there is a main difference. Organisations using ESB tend to have a centralised approach with central teams that decide about schemas, transformation and message flows. This slows down the setup of new integrations and changes of existing. This also causes systems and services to evolve at a slower pace. With Kafka the services themselves are encouraged to provide their data to others, who are free to “act, adapt and change” according to their business needs.
Centralize an immutable stream of facts. Decentralize the freedom to act, adapt, and change.
Core mantra of event-driven services
Kafka has two APIs for stream processing, Kafka Streams and KSQL. With these you can filter, join streams and run arbitrary functions on the data. There is also an API for connecting to other systems, for example databases optimised for specific purposes like search. This API can connect to legacy systems that you are moving away from.
Kafka is saving events in streams which can also be seen as logs. Keeping data as logs means that doing updates is amazingly fast, just appending a new event to the end of the log. Logs also makes it possible for Kafka to scale linearly. It is typically installed on at least three machines but can be scaled up to hundreds. When reading and writing to a log, your data is written on all machines, which makes it easy to just add one. I is nearly impossible to hit a scalability wall with Kafka, the book is describing this in more detail.
A business consists of events
A lot of things (everything?) that happens in a business can be seen as events; hotels prices are changed, bookings cancelled, refunds performed. Even when a misspelled name is corrected, it is an event. In traditional systems there are few traces of these events, since only the latest state is stored in the database. Data that could be used for historical analysis is gone, or is hard to find. The events are not visible in the system, and throughout the flow between the systems and services.
Two things that are affected by this is:
- It’s harder to analyse historical data to make business decisions, because a lot of data is not there anymore.
- It’s hard to find out what is causing a bug and exactly where it happens.
“Work with the system, not against it”
Keynote at #kafkasummit
Instead, we should embrace the fact that things that happens in a business can be stored as events. These events flow in streams, and are projected to other events. (The present state is often cached to improve performance.) When data is handled this way, there is a transparency of what has happened in the system and the flow can be followed throughout different services. By analysing the data in the streams, valuable information can be collected which can serve as foundation for important business decisions.
The outside data is considered as a single source of truth
There is a distinction between the data on the inside and the data on the outside of the service or system. The data on the outside is much harder to change since a lot of other services are depending on it. But the fact that many are dependent of it makes it also much more important than the encapsulated data inside the service. Kafka is storing the entire event log outside the service, fully available to other services and systems! This makes the outside data a first-class citizen and the single source of truth.
Make data on the outside a first-class citizen
The database is turned inside out
Many of us recognise the problem of synchronising changes over several teams. E.g. one team wants to try out a brilliant idea that the customers would love. To do that they need data from a system that is maintained by another team. The other team is busy for months with changes requested by other parts of the business, and can’t help out. This slows the whole organisation down and causes endless priority discussions and a fight for resources.
When the source of truth is placed outside systems and services, it is available for anyone. It is stored as a stream of events. This makes it possible for anybody to travel in time, choosing exactly the data that would be the best for their purpose. This is referred to as “turning the database inside out”! To improve performance, data is also cached in Kafka tables, refreshed asynchronously. These tables can be defined for different purposes and are queried using the KSQL language.
This decouples the data in an organisation and keeps it as a shared single source of truth.
Less data needs to be stored inside
When the data is stored in shared logs outside the system or service, there is less need to store data inside. Data can be kept in memory, or read from the log or views whenever needed. That would reduce the storage required when data is stored more than once. It also lowers the risk that the data in different systems or services starts to differ.
Is Kafka the answer to my naive thought about streaming data through a system in my article Functional Domain Modelling? I can be. Kafka works very well with functional programming. But streaming data throughout a system can be done in many ways. Some of them certainly more lightweight and less complicated than Kafka. There is a lot more to learn in that area!
Increased interest from developers
To develop the best possible services to their customers, many companies need to handle a lot of data, e.g. collected in apps or in IOTs. Huge amounts of data must be processed with exceptional performance. More and more developers realise that they must find new ways to tackle these requirements. I recently joined the (on-line) Kafka summit together with 25000 other developers. Most of the participants described themselves as beginners which indicates an increased interest in this technology. Of course, there is a huge step to take from building traditional systems with REST APIs and request/reply approach, to doing it with event streaming. It’s a completely different way of thinking.
Kafka is also a very complex platform and it requires specialised persons only to manage the hosting. This indicates a large-scale use to make it profitable. It seems like they are working on that, though. Both by improving the product itself, and by making it easier to host in the cloud e.g. by using Azure Event Hubs. Techniques for event streaming and event sourcing in general tends to drive complexity. Especially as it is less intuitive and you have to find different solutions than you normally do.
Not mature enough for all of us…yet!
I’m quite impressed with Kafka and find it very interesting. This is mainly because I’ve been working at large companies with hundreds of systems. It is quite obvious that the data flows in the organisation, rather than is stored in separate databases.
One question that I ask myself is: Would we benefit from using Kafka even in applications that does not have enormous amounts of data in combination with extreme performance demands? I think we might…eventually. These use cases might be handled by other tools than Kafka, but a guess from my side is that a transformation to a more common usage of event streaming will probably be performed during the next ten years.
I’ve used event sourcing and CQRS in several projects. From quite pragmatic solutions within a single microservice, to fully implemented throughout the whole system. There were lessons learned and we can discuss whether it was the right way to go or not. Nevertheless, it gave me a new dimension to software development that I can’t move away from. The technologies are not mature enough yet, and we as developers are still very much stuck in the existing paradigm. But I think that event streaming is something that we’ll have to learn and get used to, in most applications. Exactly in what forms, we don’t know yet.
But, for sure…
To boil a whole book down to one tiny article is of course impossible. There are so much more that I want to squeeze in, so many more smart conclusions and so many quotes. But one thing I’m pretty sure of:
What about the connection to Azure Event Hubs? Please read more in the links below!
Apaches description of Kafka: Apache
What Azure Event Hubs are: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
How Azure Event Hubs relates to Kafka: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Kafka .Net Client: https://docs.confluent.io/current/clients/dotnet.html
Pat Helland: Data on the Inside and Data on the Outside