DevOps in Banking and Financial Service — Challenges & Opportunity, Part II

Pradip Roychowdhury
5 min readNov 20, 2020

In this post I’ll briefly talk about from my experience on how the large global banks and recently established non-banking financial services company have been adopting DevOps or recent trends of DevSecOps across their Front Office , Mid-Office and Back-Office ( core) systems. Almost all the global banks ( whom I have been working with in last 12 to 24 months) had embarked into various form of their digital transformation journey for various reasons such as — from improving customer experience in different channels to addressing recent post-pandemic scenarios for getting involved with customers in more remote and digital way compared to bringing them in branches. And off course there are other business objectives too such as cost reduction, addressing regulatory compliance and security, improving revenue stream etc. In all of these transformation projects of these banks I have seen at least two things are common — adoption hybrid multi-cloud platform and Dev(Sec)Ops. I shall briefly touch upon 2 patterns and 2 anti-patterns that I have seen in all of these projects and this patterns and anti-patterns may be same for other industries as well. However , I feel that banking and financial services being the highly regulated industry and hence possibly having lesser risk appetite, these patterns are more visible in this industry compared to industries like Retail or Travel & Transportation, for example.

Pattern 1 : Continuous Delivery but not Continuous Deployment : I am sure that by now most of us now understand the subtle difference between these two significantly important terminology. I’ll not get into details of these two terminology here. But, to me , first one is a practice that builds the capabilities to do the second one. It means, that no organization can do the second one without having the first one, however any organization having the first one may or may not chose to do the second one. I have seen these patterns in two large global investment banks in Europe. Both of these banks have very strong practice of doing Continuous Delivery and they have platform comprised of multiple tools and technologies implementing Continuous Delivery practice for different types of application based on technology stack. However for any feature roll-out to higher environments such as Stage/Pre-prod and Prod they have established semi-automated release review processes to allow deployment. Therefore, engineering team and operations team are not always allowed to move features into production, without approval of governance bodies such as Change Advisory Board (CAB). Please not that these two investment banks are still using their on-premise private cloud environments to deploy and run these mid office and back office production workloads and also they are not completely following the second pattern as described below.

Pattern 2 : Pipeline as Code : I have seen this pattern for one of my clients which is a non-banking financial company (NBFC) in India. They have set up their entire digital channels and mortgage applications in off-premise public cloud with combination of containerized microservice applications managed by Kubernetes managed services from the cloud service provider (CSP) and some non-containerized applications running on virtual machines on cloud. For both of these two categories of applications , this NBFC is using Jenkins pipeline as code capability to implement entire project’s build/test/deploy pipeline in Jenkinsfile and treat it as another piece of code ( like application code) maintained in version control or revision control systems. With this , the NBFC have been able to achieve the Continuous Delivery capability, but they apply it for Continuous Deployment only for the hot-fixes and minor release of their Web and Mobile channel components. Any changes to the mortgage application, they still follow the first pattern. I have seen a variance of this second pattern implementation for one my recent banking clients in EU , where they have implemented Infrastructure- As-Code using Terraform and Helm Charts to deploy their entire Kubernetes clusters of Dev, Test and Prod in public cloud. They are strictly following the first pattern for their Test and Prod environment just to ensure they maintain exact environment parity between Test and Prod.

As I said that almost all the banks have been adopting DevOps, but their DevOps maturity still varies both from organizational process, skill and technology perspective. The following two anti-patterns I have seen for one of my banking clients in EU. Although this client have been using all latest and greatest technologies for cloud-native application development, still their DevOps process is not highly matured as their technology governance did not ensure to follow some of the cloud native best practices application development.

Antipattern 1 : Manual Configuration Management of Higher Environments : One of the signs of these anti-pattern was pre-production/production deployment failure after having multiple successful deployments in test. For this specific client, pods configuration settings of higher environments were done manually from console and as a result there were always configuration drifts between environments as the configurations were not maintained in version control. Release notes of the production roll-out did not contain about necessary changes to be made in environment configurations of database, pods etc. It has badly impacted Bank’s IT team’s Change Failure Rate metric. One of the key reason of this ani-pattern is not to have a proper change management process and not to have implemented the second pattern as mentioned above. The second pattern would also give the flexibility of rolling back to a previous version of pre-production or production configurations if the deployment goes wrong.

Antipattern 2 : Avoiding “Fail Fast” : I we all are aware of this mantra of success in today’s agile era — “Fail Fast”. It gives an opportunity to the developer to know about the quality of the deliverables as early as possible in the development cycle, but also it helps business users and testers to get a feel of the product early in the delivery cycle. I have seen this anti-pattern for one of large institutional banks in North America delaying production cut-over timeline as fixing new bugs identified in release regression testing in Stage environments took longer time. Release regression in staging took higher time as separate operations team were involved to deploy software in production and staging and business users/testers had their first interaction with the product when it was released to stage environment ! One of the common problems seen was due to port blocking in higher environments some of the API calls failed, which worked perfectly fine in lower environments and successfully passed thru quality gates. Solving this type of problems took longer time and application functionality could not be said to have been deployed until problems are fixed. Just think of any large global bank where different teams are involved for development, testing, operations etc and they are constantly raising tickets ( or sending emails ) to each other to identify the issue and raising doubts to each other as the application worked perfectly fine in test environments. One recipe of success in this type of scenario is to set up continuous collaboration with operations team ( of higher environments ) during the development process and if possible to rehearse deployment progressively in production like sequence in test environments. And in case there is a possibility of occurrence of the first anti-pattern then ensure a stringent review the release notes of the release pack before deploying it to higher environments.

--

--

Pradip Roychowdhury

Distinguished Chief Technologist with 25 years of experience in areas of OOP, SOA, Cloud, DevOps and Banking Transformation.