Some Quick Observations from #KafkaSummit NY

Last week several of my colleagues and I were able to head up to New York City to attend the Kafka Summit. It was a blast being in the Big Apple! I wanted to share some observations about the summit and some of the sessions.

  • The first observation is that there is a lot of buzz around event/stream processing and the Kafka streaming platform.  The hotel in the center of Manhattan was top notch. As one entered the summit, the Kafka logo was projected on the walls and was even emblazoned on the cookies. There were several companies that gave presentations on what they are doing with the platform including Linked In, ING, Airbnb, Uber, Yelp, Target, and the New York Times.

c_telfzxsaa1ydt.jpg

  • To get things kicked off, Jay Kreps gave the opening keynote (available here) and explored the question: what is a streaming platform. He noted the three things that are required.
    • the ability to publish and subscribe to streams of data
    • the ability to store and replicate streams of dataIMG_20170518_175032613
    • the ability to process streams of data
  • The Kafka streaming platform, comprised of Apache Kafka along with the Kafka Producer and Consumer APIs, Kafka Streaming, and Kafka Connect, provides all of these capabilities.
  • Jay Kreps explored the three possible lenses that one is tempted to place Kafka into – Messaging, Big Data, and/or ETL. But really the Kafka streaming platform encompasses all of these.
  • One of the best quotes at the summit came during Ferd Sheepers, architect at ING, keynote. He compared the current state of streaming platforms to going through puberty.

  • During his presentation Sheepers also explained that ING was reorienting itself into a software company that does banking. That captured the direction that Walmart is going with Walmart Labs.
  • At the summit Confluent announced Confluent Cloud (link) a fully managed Kafka as a Service (KaaS) capability available on various public clouds. There were lots of cheers when this was revealed. At Walmart Labs there is a mix of private and public clouds, mostly the former, and an internal group standing up managed Kafka clusters for product teams. But I imagine this will be a great resource for many other companies.

c_uc_3zvyaaurj9.jpg

  • The best session was Jun Rao’s Kafka Core Internals. It really was a deep dive into how Kafka works under the hood. I wish there were more sessions at the summit that were of this quality and depth. Too many of the sessions were too broad or covered concepts at too high a level.

C_UYQ9iU0AE47fR

  • We are using Kafka Connect and Stream Reactor to pull data out of Cassandra and make it available as streams to other consumers in the enterprise. But for those using relational databases, Debezium offers a set of connectors that work at the transaction log level capturing changes and publishing them to Kafka.
  • Instead of having to write your own streaming solution when moving data to/from Kafka and other data sources when transformations are needed, products can take advantage of Kafka Connect and use the relatively new single message transformer (KIP-66). This is great for those use cases that require simple transformations.
  • The Data Dichotomy session (similar slides) by Benjamin Stopford is the one I am looking forward to listening to again so that I can wrap my head more fully around the ideas that were presented.  In an enterprise of independent micro-services we are faced with the challenge of needing to share data between them (blog). A situation made more complex because sharing a database between services is considered an anti-pattern.

  • The two main solutions – (1) services that encapsulate functionality and data – and – (2) messaging that moves large sets of data to numerous consumers both have the drawback of data divergence. As mutable data is shared, used, and updated across the enterprise it diverges from the source. The solution proposed was using Kafka as a single immutable data source sitting between various micro services and allowing data to be shared between them.

  • The new “exactly once” guarantee (KIP-98) that is going to be available in 0.11 sounds promising. But I had a train to catch so I missed the session on this new capability.

Yet another Camel Circuit Breaker…

Camel 2.19 was recently released. You can read about that on Claus Ibsen’s blog. A quick look through the release notes and it will be readily apparent that there are numerous new features that went into this version.

452246813_fd26cb6539_z.jpg
Camel: modified from original

One of the new features is the ThrottlingExceptionRoutePolicy, which implements the circuit breaker pattern.

The circuit breaker pattern is described by Martin Fowler on his bliki as follows:

The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all.

Now if you have used Camel for any length of time you may be wondering why is there another implementation of this pattern. After all Camel already provides two.

The first was released with 2.14 and was added by extending LoadBalancerSupport in the CircuitBreakerLoadBalancer.  The documentation can be found on the Camel site under the Load Balancer EIP. The slightly modified sample DSL is from the Camel documentation.

from("direct:start")
    .onException(RejectedExecutionException.class)
        .handled(true)
        .transform().constant("service unavailable")
    .end()
    .loadBalance()
          .circuitBreaker(2, 1000L, MyCustomException.class)
          .to("mock:service")
    .end();

In the prior release of Camel (2.18), the Netflix Hystrix circuit breaker implementation was added to the expanding set of Camel capabilities. The slightly modified sample DSL is from the Camel documentation.

from("direct:start")
    .hystrix()
        .to("mock:service")
    .onFallback()
        .transform().constant("service unavailable")
    .end()

Both of these options are used to protect the consumer of a Camel route from external resources that are not responding or exceed some specified timeout threshold. The idea is that once the specified number of failures has occurred the route will stop calling mock:service and instead perform some faster operation. In this case return “service unavailable”.

But what happens if we would like to stop consuming from the endpoint when the circuit is in the open state rather than bypass the service call. This is what the ThrottlingExceptionRoutePolicy provides. It is based on the CircuitBreakerLoadBalancer and is implemented as a Camel RoutePolicy.

Here is how it basically works. All un-handled exceptions thrown from the route are passed to the RoutePolicy where they are evaluated. Based on the settings the circuit breaker will count the number of exceptions (failureThreshold) over a period of time (failureWindow). If the threshold is met in the time allotted then the circuit is opened and the route will stop consuming from the endpoint. This might make sense when one is consuming from a JMS queue or a directory with files. Rather than read a message and roll it back continually, the route can stop using resources for a period of time when the external services it depends on are failing. 

Here is a sample DSL:

onException(BadServiceException.class)
    .handled(false); // open circuit

onException(SomethingElseException.class)
    .handled(true); // don't open circuit

from("jms:queue:start")
    .routePolicy(circuitPolicy)
    .to("mock:service")

Assuming one is using Spring, this is how the RoutePolicy would be configured. The fourth parameter is a List that can be used to limit the Exceptions that should be used to determine whether to open a circuit.

@Bean
public ThrottlingExceptionRoutePolicy circuitPolicy() {
    return new ThrottlingExceptionRoutePolicy(
        this.getFailureThreshold(),
        this.getFailureWindow(),
        this.getHalfOpenAfter(), null);
}

After a specified amount of time (halfOpenAfter) the circuit will move to the half open state and start to consume messages from the endpoint. If any failures occur during the half open state the circuit will return to the open state. If messages are successfully processed it will move to the closed state.

Since the route is started in the half open state it is possible that several messages might be consumed from the endpoint. To avoid this situation the ThrottingExceptionHalfOpenHandlerinterface is provided.

public interface ThrottingExceptionHalfOpenHandler {
    boolean isReadyToBeClosed();
}

If an implementation of this interface is available it will be used instead of starting the route. Implementations can be used to check the external resources to see if they are available. For example they might issue a simple SQL statement

select 1 from dual

or call the REST service and evaluate the response.

Don’t be a half-witted, nerf herder. Or what is Clean Code?

Our team is reading through the classic Clean Code together. After reading a chapter or two we will get together to discuss the concepts over lunch. We try to keep them fun and interactive. These are the slides from our first session (link).

In discussing the question – why don’t we write clean code – we explored the following three reasons.

There is a level of subjectivity

There is a good chance that when I opened a pull request for my team to evaluate, I thought the code being pushed read like well written prose, was understandable to others, tested, and was thus maintainable by my colleagues. However, the code should be considered readable by the team that is responsible for owning and maintaining it. Which usually means that there will be some comments and feedback.

mycode

There may not be an understanding of what Clean Code is

If writing clean code was obvious I imagine that Bob Marin would not have written a book on it. And sites , like the Daily WTF, poking fun at various “dirty” code would not exist. Understanding what clean code looks like and the techniques to improve it must be learned. Our goal as a team is to work through Clean Code so everyone on the team will know what clean code is and why it is important.

how-does-that-work

There was a deadline to meet so there wasn’t any time

Perhaps the most common excuse for why “dirty” code is accepted and deployed is because the team rushed to put something together to meet a deadline. Unfortunately, that means that we now have to make time to not only add new features, but also go back and clean up the mess we made.

no-time

Bob Martin pushes back on the deadline excuse, noting:

The only way to make the deadline – the only way to go fast – is to keep the code as clean as possible at all times.

Quality is not a product variable

When I was exposed to Agile one of the notions that stuck with me was something similar to the Iron Triangle presented in this article (link). The idea was that the product owner and stakeholders would be presented with the project variables (scope, schedule, and resources). After flushing these out they would be asked to identify which of these variables is fixed, which is firm, and which is flexible. The caveat is that each of these can values can only be used once.

managing-project-variables

When marking the variable that is fixed the stakeholders are identifying the part of the product that cannot be changed in order to provide value. Maybe the scope (features) is fixed because the capabilities defined already were scaled back so that the minimum viable product (MVP) was being built.

The next variable is marked as firm. The variable identified in this manner is one that the stakeholders can give on here and there and still achieve value. For example if the scope was considered fixed, then the schedule that was proposed might be able to change depending on how the project moved forward.

The last variable is considered flexible. This is where the stakeholders have the most ability to change things so that the other two variables can be met. In this example, the resources (servers, teams, support, training) are where the stakeholders can invest the most to meet the scope and schedule.

This technique has proven helpful in shaping expectations on a project. Though it is important to continually evaluate these variables along the way. What really stuck with me was this. When the instructor explained this concept, he wisely noted that one variable was left off the chart. This was because is was non-negotiable. That was code quality.

 

Write Programs for People First, Computers Second

There are several computer books that have become classics. One of these is Code Complete by Steve McConnell. The first edition was written in 1993. That goes back to when I started collecting a paycheck as a professional developer. And it precedes classics like the Gang of Four’s Design Patterns (1994), The Pragmatic Programmer: From Journeyman to Master (1999), Agile Software Development: Principles, Patterns, and Practices (2002) and Clean Code (2008). Some consider this book to be the first collection of coding practices.

In this book, McConnell stresses the importance of readable code.

codecomplete_readable

He notes that writing readable code is one of the things that separates the great coders from the rest.

A great coder [level 4] who doesn’t emphasize readability is probably stuck at Level 3, but even that isn’t usually the case. In my experience, the main reason people write unreadable code is that their code is bad. They don’t say to themselves “My code is bad, so I’ll make it hard to read.” They just don’t understand their code well enough to made it readable, which puts them at Level 1 or Level 2.

Even before Code Complete, the book Structure and Interpretation of Computer Programs, written by Abelson and Sussman, was published in 1985. The full text of the book is available online (link). The preface to the first edition (link) contains the oft repeated line:

programs must be written for people to read, and only incidentally for machines to execute.

Why is readability so important?

As famed developer Joel Spolsky notes: It’s harder to read code than to write it.

We have all been there. We saw a problem, rolled up our sleeves, and wrote some code that we thought was quite clever. Maybe something like this (credit for this gem goes to this code golf puzzle).


"Gdkkn\x1fVnqkc".split("").collect(&:succ)*''

Or maybe this (modified from a DailyWTF sample)

var m = function(p1) {
	if (p1 > 0 || p1 < 0) {
 		return parseInt(("-" + p1).replace("--", ""));
	}
	else { return p1;} };

And we walked away feeling like this

solo_i-rock

Unfortunately, the rest of our team took one look at it and reacted more like this

solo_shooting-self

Why? Because the code is hard to read. And if it is hard to read it will be hard to understand. And if it is hard to understand it will be hard to maintain.

Bob Martin explains this more in Clean Code, noting that we spend more time reading code than writing it.

Indeed, the ratio of time spent reading vs. writing is well over 10:1. We are constantly reading old code as part of the effort to write new code.

… There is no escape from this logic. You cannot write code if you cannot read the surrounding code. The code you are trying to write today will be hard or easy to write depending on how hard or easy the surrounding code is to read. So if you want to go fast, if you want to get done quickly, if you want your code to be easy to write, make it easy to read.

(Mostly) XMLless Spring

I am working on a project using Spring. As a team we were using Spring’s Java configuration. But we also had an applicationContext.xml file. We wanted to remove the need for this file. This post captures the basic steps to accomplish that.

Our web.xml was using the standard approach to load Spring using the WebApplicationContext via the ContextLoaderListener.

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

  <listener>
    	<listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
  </listener>

</web-app>

This setup requires, at a minimum, a component-scan element in our applicationContext.xml file so that Spring will look for the Java classes that include an @Configuration annotation.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:context="http://www.springframework.org/schema/context"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
                        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
                        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd">

  <context:component-scan base-package="codesmell" />

</beans>

Assuming we had a simple class FooBar that was configured in a Java class, Config as follows:

@Configuration
public class Config {

    @Bean
    public FooBar doFooBar() {
        return new FooBar();
    }
    
}

Then when we run the application we would see the following (assuming the FooBar class writes out ‘foo bar!’ (see line 11) when it is created):

Nov 22, 2016 2:37:34 PM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring root WebApplicationContext
Nov 22, 2016 2:37:34 PM org.springframework.web.context.ContextLoader initWebApplicationContext
INFO: Root WebApplicationContext: initialization started
Nov 22, 2016 2:37:34 PM org.springframework.web.context.support.XmlWebApplicationContext prepareRefresh
INFO: Refreshing Root WebApplicationContext: startup date [Tue Nov 22 14:37:34 EST 2016]; root of context hierarchy
Nov 22, 2016 2:37:34 PM org.springframework.beans.factory.xml.XmlBeanDefinitionReader loadBeanDefinitions
INFO: Loading XML bean definitions from ServletContext resource [/WEB-INF/applicationContext.xml]
Nov 22, 2016 2:37:34 PM org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor <init>
INFO: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
foo bar!

With a few additions to our web.xml file we can delete the applicationContext.xml file.

We will add the AnnotationConfigWebApplicationContext so that the WebApplicationContext will accept annotated classes. Then we will add a contextConfigLocation so that Spring will use our Java config class instead of the default of /WEB-INF/applicationContext.xml

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

  <listener>
    <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
  </listener>

  <context-param>
    <param-name>contextClass</param-name>
    <param-value>org.springframework.web.context.support.AnnotationConfigWebApplicationContext</param-value>
  </context-param>

  <context-param>
    <param-name>contextConfigLocation</param-name>
    <param-value>codesmell.config.Config</param-value>
  </context-param>
</web-app>

Now we get this:

Nov 22, 2016 2:36:30 PM org.apache.catalina.core.ApplicationContext log
INFO: No Spring WebApplicationInitializer types detected on classpath
Nov 22, 2016 2:36:31 PM org.apache.catalina.core.ApplicationContext log
INFO: Initializing Spring root WebApplicationContext
Nov 22, 2016 2:36:31 PM org.springframework.web.context.ContextLoader initWebApplicationContext
INFO: Root WebApplicationContext: initialization started
Nov 22, 2016 2:36:31 PM org.springframework.web.context.support.AnnotationConfigWebApplicationContext prepareRefresh
INFO: Refreshing Root WebApplicationContext: startup date [Tue Nov 22 14:36:31 EST 2016]; root of context hierarchy
Nov 22, 2016 2:36:31 PM org.springframework.web.context.support.AnnotationConfigWebApplicationContext loadBeanDefinitions
INFO: Successfully resolved class for [codesmell.config.Config]
Nov 22, 2016 2:36:31 PM org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor <init>
INFO: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
foo bar!

Nested Fluent Builders with Java 8

In the last post we looked at how to create an object using a fluent API in order to remove the code smell of a constructor with a long parameter list. We wrote the fluent methods directly onto the Invoice POJO, but we saw this broke the principle of Command Query Separation. In this post we will apply the Builder pattern, with the primary purpose of exploring how to make a nested fluent API using Java 8 lamdas. For more information on this pattern see Item 2 in the second edition of Effective Java by Josh Bloch.

We will start by removing the static factory methods from our Invoice class and replace them with a public static inner class, InvoiceBuilder. We will also need to add a static method (builder) that allows an instance of the InvoiceBuilder to be returned to any object that wants to create an Invoice. Finally, we will want to make sure that the constructor is private to force the use of the builder method to create an Invoice.

public class Invoice {
	...

	/**
	 * static factory method for builder
	 */
	public static InvoiceBuilder builder() {
		return new Invoice.InvoiceBuilder();
	}

	/**
	 * force use of builder()
	 */
	private Invoice() {
	}

	public static class InvoiceBuilder {
		// more to come
	}
}

As a static inner class we do not have access to non-static variables on the outer class. But if we manage an instance of the Invoice from inside our InvoiceBuilder we can take advantage of being an inner class and access the private instance variables through that instance. Continue reading “Nested Fluent Builders with Java 8”

Fluent Object Creation

When we are tasked with creating a POJO we will often fire up our favorite editor, add the attributes, and generate the accessor methods.

Here is an example of an Invoice class. For simplicity sake only a few of its many attributes will be shown.

public class Invoice {

	private InvoiceActor invoiceSupplier;
	private InvoiceActor invoiceDestination;
	private String supplierDocumentId;
	private Date documentDate;
	private String trailerNumber;
	private List<InvoiceItem> items;
	...

Given this code we would then create an instance of our Invoice and use our setters to populate the values of the attributes.

InvoiceActor supplier = new InvoiceActor();
supplier.setActorName("101");
supplier.setActorType(InvoiceActorType.DC);
...

List<InvoiceItem> list = new ArrayList<InvoiceItem>();
InvoiceItem itemOne = new InvoiceItem("TK421", 10, 8);
list.add(itemOne);
...

Invoice inv = new Invoice();
inv.setSupplierDocumentId("5150");
inv.setTrailerNumber("2112");
inv.setInvoiceSupplier(supplier);
inv.setInvoiceDestination(reciever);
inv.setItems(list);

When creating our POJO, we could add a constructor to our class that accepts the values for the various attributes so that they are initialized when the object is created. It allows us to create the object with less code than what we see above. It can also allow us to insure that our object was in a valid state at the point it is created (see Constructor Initialization for more details). And if we were going to make our Invoice class an immutable object we would need to get rid of the setters so we would typically turn to a constructor with parameters as an alternative to populate the attributes. Continue reading “Fluent Object Creation”