TechNotes: What is Hystrix?

Hystrix is a Netflix library. The definition provided at Github reads:

"Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable."

Now to grasp what it implies one has to think of a “distributed environment”. Today, most applications are moving towards a modular architecture. Meaning a big monolithic application encapsulating everything is no longer preferred. Instead, it is broken down into more manageable smaller modules; or microservices each dealing with a specific chunk of the application. To present a crude example, let’s say, we have an online shopping application. Different chunks like maintaining data on products and registered users, authentication of users, payment processing etc. could be exposed via different services or modules or third party libraries. Now a call to any of the services or client library that could invoke a request over the network is a potential source of latency or worse, failure. This is where Hystrix comes in.

Consider an application that entertains heavy user traffic in such a distributed environment with a lot of dependencies. Now if a certain service is down or is too slow to respond it could slow down or throttle the entire application. The following diagram from the Hystrix site draws a picture.

Fig.1: (courtesy: GitHub)

Now what Hystrix does is it creates a pool of threads for each dependency in the application. So even if a service is not behaving as expected, the application system continues to function. Take a look at the following picture offered by Netflix to explain this scenario.

Fig.2 (courtesy GitHub)

Thus, it helps to isolate such points of access between services thereby, avoiding cascading failure across the different application layers. It also provides fallback options, facilitates monitoring the system state and many other desirable features; thus, improving upon the application’s fault-tolerance and resiliency.

In fact, Hystrix was born out of the resilience engineering work undertaken by Netflix around 2011. Yes, modular programming has its own price tag but according to the data collected and analysed the value it offers far exceeds its cost.

Hope that summarizes the basics of what Hystrix is all about. Wrapping it up with a few of the jargon.

a) Commands -- any request to a dependency has to be wrapped in a Command. Think of it as a Java class to which the arguments required when invoking the request are to be passed as parameters. There are two types of commands:

i) HystrixCommand -- used when a single response is expected from the dependency

HystrixCommand cmd = new HystrixCommand(arg1, arg2);

ii) HystrixObservableCommand -- used when the dependency is expected to return an Observable that could emit a response(s)

HystrixObservableCommand cmd = new HystrixObservableCommand(arg1);

b) Command Execution -- a command can be executed in one of the following four ways.

i) execute() -- makes a blocking, synchronous call that either returns a single response or an exception

ii) queue() -- returns a Future from which the single response can be later retrieved

iii) observe() -- subscribes to the Observable that represents the response(s) from the dependency

iv) toObservable() -- returns an Observable that when subscribed to executes the command and returns the response(s)

c) Circuit-Breaker Pattern -- This is a much talked about feature offered by Hystrix that helps to check cascading failure across the different application layers. If the load on a certain dependency exceeds a certain threshold or if a service has not been responding for a certain number of consecutive requests, the circuit is considered "open"; implying no further requests are routed to it for a certain window period. After the elapse of this period, a request is made to see if the service is ready to entertain further requests. If yes, further request is resumed; if not, the circuit is again considered "open" for the window period. The good thing is, it is all configurable-- the threshold at which the circuit should be opened; the window period etc. In fact, one could just "open" the circuit and check how it behaves.

I think, this much should suffice for now. More details and examples on using it would be taken up another time.

TechNotes

Sunday, 29 March 2015

What is Hystrix?

2 comments:

Blog Archive