Last week several of my colleagues and I were able to head up to New York City to attend the Kafka Summit. It was a blast being in the Big Apple! I wanted to share some observations about the summit and some of the sessions.
- The first observation is that there is a lot of buzz around event/stream processing and the Kafka streaming platform. The hotel in the center of Manhattan was top notch. As one entered the summit, the Kafka logo was projected on the walls and was even emblazoned on the cookies. There were several companies that gave presentations on what they are doing with the platform including Linked In, ING, Airbnb, Uber, Yelp, Target, and the New York Times.
- To get things kicked off, Jay Kreps gave the opening keynote (available here) and explored the question: what is a streaming platform. He noted the three things that are required.
- the ability to publish and subscribe to streams of data
- the ability to store and replicate streams of data
- the ability to process streams of data
- The Kafka streaming platform, comprised of Apache Kafka along with the Kafka Producer and Consumer APIs, Kafka Streaming, and Kafka Connect, provides all of these capabilities.
- Jay Kreps explored the three possible lenses that one is tempted to place Kafka into – Messaging, Big Data, and/or ETL. But really the Kafka streaming platform encompasses all of these.
- One of the best quotes at the summit came during Ferd Sheepers, architect at ING, keynote. He compared the current state of streaming platforms to going through puberty.
- During his presentation Sheepers also explained that ING was reorienting itself into a software company that does banking. That captured the direction that Walmart is going with Walmart Labs.
- At the summit Confluent announced Confluent Cloud (link) a fully managed Kafka as a Service (KaaS) capability available on various public clouds. There were lots of cheers when this was revealed. At Walmart Labs there is a mix of private and public clouds, mostly the former, and an internal group standing up managed Kafka clusters for product teams. But I imagine this will be a great resource for many other companies.
- The best session was Jun Rao’s Kafka Core Internals. It really was a deep dive into how Kafka works under the hood. I wish there were more sessions at the summit that were of this quality and depth. Too many of the sessions were too broad or covered concepts at too high a level.
- We are using Kafka Connect and Stream Reactor to pull data out of Cassandra and make it available as streams to other consumers in the enterprise. But for those using relational databases, Debezium offers a set of connectors that work at the transaction log level capturing changes and publishing them to Kafka.
- Instead of having to write your own streaming solution when moving data to/from Kafka and other data sources when transformations are needed, products can take advantage of Kafka Connect and use the relatively new single message transformer (KIP-66). This is great for those use cases that require simple transformations.
- The Data Dichotomy session (similar slides) by Benjamin Stopford is the one I am looking forward to listening to again so that I can wrap my head more fully around the ideas that were presented. In an enterprise of independent micro-services we are faced with the challenge of needing to share data between them (blog). A situation made more complex because sharing a database between services is considered an anti-pattern.
- The two main solutions – (1) services that encapsulate functionality and data – and – (2) messaging that moves large sets of data to numerous consumers both have the drawback of data divergence. As mutable data is shared, used, and updated across the enterprise it diverges from the source. The solution proposed was using Kafka as a single immutable data source sitting between various micro services and allowing data to be shared between them.
- The new “exactly once” guarantee (KIP-98) that is going to be available in 0.11 sounds promising. But I had a train to catch so I missed the session on this new capability.