Sunday, 29 March 2015

What is Hystrix?

Hystrix is a Netflix library. The definition provided at Github reads:

"Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable."

Now to grasp what it implies one has to think of a “distributed environment”. Today, most applications are moving towards a modular architecture. Meaning a big monolithic application encapsulating everything is no longer preferred. Instead, it is broken down into more manageable smaller modules; or microservices each dealing with a specific chunk of the application. To present a crude example, let’s say, we have an online shopping application. Different chunks like maintaining data on products and registered users, authentication of users, payment processing etc. could be exposed via different services or modules or third party libraries. Now a call to any of the services or client library that could invoke a request over the network is a potential source of latency or worse, failure. This is where Hystrix comes in.

Consider an application that entertains heavy user traffic in such a distributed environment with a lot of dependencies. Now if a certain service is down or is too slow to respond it could slow down or throttle the entire application. The following diagram from the Hystrix site draws a picture.



                                                                 Fig.1: (courtesy: GitHub)


Now what Hystrix does is it creates a pool of threads for each dependency in the application. So even if a service is not behaving as expected, the application system continues to function. Take a look at the following picture offered by Netflix to explain this scenario.




                                                                    Fig.2 (courtesy GitHub)

Thus, it helps to isolate such points of access between services thereby, avoiding cascading failure across the different application layers. It also provides fallback options, facilitates monitoring the system state and many other desirable features; thus, improving upon the application’s fault-tolerance and resiliency.

In fact, Hystrix was born out of the resilience engineering work undertaken by Netflix around 2011. Yes, modular programming has its own price tag but according to the data collected and analysed the value it offers far exceeds its cost. 

Hope that summarizes the basics of what Hystrix is all about. Wrapping it up with a few of the jargon.  

a) Commands -- any request to a dependency has to be wrapped in a Command. Think of it as a Java class to which the arguments required when invoking the request are to be passed as parameters. There are two types of commands:
    i) HystrixCommand -- used when a single response is expected from the dependency
                      HystrixCommand cmd = new HystrixCommand(arg1, arg2);

  ii)  HystrixObservableCommand -- used when the dependency is expected to return an Observable                                           that could emit a response(s)
                      HystrixObservableCommand cmd = new HystrixObservableCommand(arg1);


b) Command Execution -- a command can be executed in one of the following four ways.
   i)   execute() -- makes a blocking, synchronous call that either returns a single response or an                                       exception
  ii)   queue()  --  returns a Future from which the single response can be later retrieved
 iii)   observe() -- subscribes to the Observable that represents the response(s) from the dependency
 iv)   toObservable() -- returns an Observable that when subscribed to executes the command and                                   returns the response(s) 

c) Circuit-Breaker Pattern -- This is a much talked about feature offered by Hystrix that helps to check cascading failure across the different application layers. If the load on a certain dependency exceeds a certain threshold or if a service has not been responding for a certain number of consecutive requests, the circuit is considered "open"; implying no further requests are routed to it for a certain window period. After the elapse of this period, a request is made to see if the service is ready to entertain further requests. If yes, further request is resumed; if not, the circuit is again considered "open" for the window period. The good thing is, it is all configurable-- the threshold at which the circuit should be opened; the window period etc. In fact, one could just "open" the circuit and check how it behaves.


I think, this much should suffice for now. More details and examples on using it would be taken up another time.


Sunday, 22 March 2015

Git Checkout and Long Filenames

This is just a short note that could be helpful in doing a 'git' checkout of projects/files having longer than usual names. A couple of days back, had to check out a 'git' project owned and managed by another team. The project was pretty big so the 'git' clone operation ran for hours but eventually ended with the following message.

"cannot create directory.....
warning: Filename too long
warning: clone succeeded but checkout failed..."


Yes, it's baffling. Yes, it's cumbersome to have that long a filename but there was nothing I could do; till I came across the following solution. Give the following command to allow long filenames using 'git' in a Windows system:

"git config --system core.longpaths true"

And then if the clone had succeeded, use the following command to complete the checkout process and one is good to go.

"git checkout -f HEAD"