The right way to migrate database with Liquibase

Port adelaide river kayaking by Arthur

Is embedded Liquibase the right way?

Embedding Liquibase into the startup process of an application is a very common pattern for good reason

I agree with this statement from Liquibase blog to a large extent and use Liquibase in most of the Spring Boot + RDS projects I have worked on.

But embedding Liquibase into startup processes comes with caveats:

I knew the caveats above all along and I thought they were not too bad until a hot summer day at work in 2019.

In a small and crowded office with a broken air conditioner, a Post Incident Review meeting continued. I was explaining why our Account microservice had gone offline midday and could not recover itself without manual intervention against the production database of a financial service. The audience was a group of serious looking colleagues from Ops, DevOps, PM office and customer support team. After the root cause analysis we reviewed the incident response actions. Tension in the room eased a bit afterwards as the Time To Recovery was not too bad thanks to prompt collaborative actions of teams:

Colleagues in the room started showing some signs of relief. Just then, the production support channel on Slack started flashing with a new alarm on a few laptops around the table. You probably have guessed it right, that Account service went down again…

How exactly had the incidents happened

For the curious souls, the incident was an textbook example of the butterfly effect, which involved Liquibase embedded in a Spring Boot microservice.

Background: during startup of each new application instance, the embedded Liquibase waits for, obtains and then releases a database change lock to make sure database migration is performed safely in a clustered environment

We had released a new version to Account service that morning with a resource thirsty change. That change caused the service to hit the performance benchmark quite often so service instances were recycled much more frequently than usual.

As more instances entered the startup phase, competition for the same lock got intensive and the startup process prolonged further and further until it came close to the healthcheck timeout of the container orchestrator.

Eventually, there was an instance that got killed right after obtaining the lock without a chance to release it. That lock then became a Deadlock. After that, no new service instances would make their way through startup.

With no more new instances joining the force, remaining healthy services received more and more requests and fell down one by one until all service instances were out.

How we fixed it

The second emergency hotfix on that hot summer day

How to release the deadlock

The standard fix is to run the Liquibase unlock command or connect to database directly and run a SQL update statement simple as

UPDATE DATABASECHANGELOGLOCK SET LOCKED=0, LOCKGRANTED=null, LOCKEDBY=null where ID=1; commit;

One step further — detect and release the deadlock automatically

Constant manual remediation to a production environment is unacceptable in regulated industries, hence automation of the remediation is naturally the next step.

At that time, it felt right to move on to schedule a query to monitor the DATABASECHANGELOCK table and release the lock if it has not been released beyond a threshold. I thought of three ways to schedule such a job within three seconds, how good is that?!

Feeling like we were on the correct path, I set off to find a reasonable threshold. It cannot be too short otherwise we risk running the migration twice; it cannot be too long because that means another potential production outage.

After some reflection on my past experiences, I decided to pick two minutes as a reasonable starting point for most releases without digging into history logs.

Time to validate the idea against some edge cases — what if a release involves a time-consuming database update that cannot not finish within five minutes? Setting it to two minutes will cause the lock to be released and database migrated twice. Maybe we can change the threshold for that particular release and revert back afterwards? It suddenly felt like playing catch-22. But production is definitely the wrong venue for that game.

Avoid the deadlock (an <Are your Lights On> moment)

After an aha moment, I realised the real problem here is how to avoid the deadlock rather than to release it.

Embedding Liquibase in Spring Boot applications is definitely out the window and manual database migration for every release was out even earlier. How about placing database migration back into the deployment stage where it has truly belonged all this time?

That sounds about right. A solution was formed soon afterwards where the key part of the solution was to execute the Liquibase database migration from the deployment pipeline. I will explain in detail soon.

There is nothing new under the sun

A year after the first solution, I joined my friend Shane Doolan at Adatree, where Liqubiase was again embedded in our Spring Boot microservices.

I told Shane my story and he got the problem instantly. After examining the deployment strategy, we realised that my previous solution would not work as is because Adatree’s SaaS offering was designed to distribute versioned service products across AWS account boundaries to customer accounts via AWS Service Catalog. The service registry uplifts the deployment model to a new level but the extra abstract layer also took away the access from Adatree’s deployment pipeline to runtime environments in our customers’ AWS accounts.

But, above all we are determined to avoid the deadlock issue and we love technical challenges like this. In the end we overcame the restriction utilising AWS Lambda Custom Resource and AWS Cloudformation.

Comparing those two solutions,

The right way

Timing

Permission

Solutions

If you have full control of the deployment pipeline and access to the runtime environment from the pipeline, this is the relatively simple solution for it. Liquibase could be invoked via Maven plugin, container image or SDK.

Figure 1. perform database migration using Liquibase directly from deployment pipeline

2. AWS Lambda-backed Custom Resource solution

If you are distributing your services as self-contained product packages to customer accounts via cloud native services such as AWS Service Catalog. It is likely that

So what do we do?

Figure 2. [AWS] perform database migration using Liquibase via AWS Lambda-backed Custom Resource

p.s. For applications hosted on AWS and deployed using AWS CloudFormation, I would say “this is the way”. ;-)

Hope you have enjoyed this blog. Until next time, my friends.

We shape our tools and thereafter our tools shape us

John Culkin, SJ

Adatree focuses on enabling and empowering FinTech innovations via CDR (a.k.a Open Banking). We like to contribute back to the tech community as much as we can and we will opensource the second solution soon, please stay tuned.

References:

Three ways to run Liquibase

Using Liquibase in Kubernetes

AWS Lambda-backed custom resources

Father of two giggly girls; a technical problem solver who focuses on both delivery and growth

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store