A few years ago I have moved from a mostly routine security role to a more creative one, where I am doing everything, from architecture definition to implementation and operation of a various security systems like internal PKI.
Our team is trying to move fast, but at the same time safely operate and maintain already deployed stack, for which we have adopted DevOps principles. We are using Terraform for provisioning, Ansible and homegrown scripts for deployment. Pipelines are running on Gitlab runners. Infrastructure and services are defined as a code, the changes are done via git flow.
After all those years I can say, that we have already seen a myriad of problems that comes from the “Build it, maintain it, operate it” territory.
Below is a list of the most obvious problems I have encountered. There is lots of talks and easy answers on how to solve them, but in the reality it is a painful process, especially if you want to solve it once and for everybody in your organization.
Not all of those questions and answers are related to DevOps, but lots of them are easy solved if you are doing things manually.
Easy answers you will find on every meetup or conference:
- just use HashiCorp Vault for secrets & internal x509 certificates
- use Let’s Encrypt for external services. Expose your infrastructure. It’s free!
- zero trust everything
- blue/green deployment or rolling updates
- immutable infrastructure!
- use Terraform, it’s working everywhere
After some digging and trying to be production ready, problems with those answers arise. Nobody will tell you the right solution. Everybody will reinvent the wheel with their own internal one.
- how to authorize services which are asking for TLS certs?
- how to provision passwords, key material & API tokens?
- how to scale secret management, are you calling Vault admin for every new service? Where is the self-service?
- how to enroll new users? How to reset passwords? 2nd factor?
- how to replace hardware token for remote workers?
- how to share passwords/keys between employees? Why you need it in the first place?
- how to do blue/green, rolling updates or loadbalancing with that legacy LDAP/389/Java stateful applications everybody wants to integrate with?
- how to provision to another cloud without rewriting Terraform files for another provider?
While solving all your engineering problems, do not forget that business requirements does not change with every new technology.