At trivago, we run webservices with complex backends in different regions around the globe 24/7. Our system is being iterated and developed on a daily basis. Naturally, mistakes will be made and something will break eventually. Engineers being on-call are the first responders to issues with negative impact on our users and the business.
When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.
This article presents how trivago's search backend team used reactive programming in Java effectively when designing and implementing one of our many Java backend services. Compared to traditional imperative and functional programming, reactive programming requires a mindset-shift in order to apply the concepts and techniques effectively. The benefits we gain support us in some key challenges that every engineer is facing with essentially every (micro-) service in today’s backend architectures: handling of blocking IO, backpressure, managing highly varying loads as well as message and error propagation.