A Secure WordPress on AWS
So I have opinions on PHP, mostly not what you would call positive, especially in regard to security…
Anyone who has done security architecture probably has the same thoughts (although to be slightly fair it has improved — a bit), however it remains dishearteningly popular especially for blogging platforms, principally WordPress.
A few months ago I was approached by a soon to be charity asking if I could help them with their AWS platform, that was mostly built around, you guessed it, WordPress.
Currently the trustees of the charity pay for the hosting out of their own pocket, so cost management is a big issue, but its also important that the site doesn’t go down suddenly and things are a little more resilient.
Also because the charity in question (that I am very noticeably not mentioning) operates in the defence sector, security is a major issue. And they of course use, you guessed it — WordPress, all running within a single Lightsail instance.
Irrespective of the solution, any architecture comes down to understanding what is wanted, and in many cases turning that into what is achievable, in this case:
- Make sure we are secure
- Enable scaling up and growth of the platform to support expansion
- Can you make it as cheap as possible!
The last is surprisingly straightforward, once I produced the high level architecture I took it to the Amazon account management team who regularly support charities, and was able to obtain $500 of vouchers, which means the hosting of the platform is covered for the next year (in return for this blog post describing the architecture — which I think it eminently fair).
Jumping back to point 1, how do we make things secure?
Well security isn’t that complex (that is if we are not negotiating terms of a contract), it simply involves doing some reasonably common sense things such as “patch”, limit permissions to only what you need to do (which can be a bit trial and error) and enable multi-factor authentication — seriously you will solve 90%+ of issues with the basics -
The Centre for Internet Security have rather helpfully produced a series of benchmarks for both an AWS account, but also for various flavours of Linux and key enabling technologies for WordPress such as Apache, and although they kindly provide AMI’s of pre-configured Linux on the marketplace for a nominal fee, it was cheaper for me to spend a rainy Saturday afternoon configuring things by hand.
It also allowed me to implement a few of my favourite items from 12factor namely to treat logs as a stream of events by installing and configuring the Cloudwatch agent to feed all of the logs into a single central location, this is especially important in any Cloud platform where you use autoscaling as instances will be spun up and down dynamically that can result in the loss of important diagnostic data in the event of an issue (and there will be issues).
It also provides us with basic SIEM (Security Incident and Event Monitoring) functionality where we can trigger alerts should anyone directly SSH into an instance.
Next up was how we were going to build the platform, and in the immortal words of Tom Lehrer I did my ‘Research’ aka plagiarised the excellent whitepaper Amazon have produced on the subject. And for those not aware of Mr Lehrer’s wisdom, for shame!!
The whitepaper is a goldmine of useful information, especially if like me you’ve spent far too long around ‘enterprise’ (aka expensive and bad) java applications, there is even a full Cloudformation template available here on GitHub allowing you to deploy everything with a single click.
The main problem with this is that it costs a few hundred dollars a month to run, something I’d rather not lumber a Charity with, especially as it’s not at a point where this level of infrastructure is needed.
So I took a hacksaw to this and produced the following -
3 subnets are created in each AZ, with only one subnet being public (where the load-balancer resides), the load-balancer only accepts connections on port 443 and can only talk to the web-server on the WordPress instance, and only on the web-server port of the instance. The other two subnets would have no direct access to or from the web.
The load-balancer is an Application Load Balancer (ALB, optimised for web traffic) and has an SSL cert provisioned by Amazon’s Automated Certificate Manager (meaning no need to pay for a cert, and no need to do any ongoing management), all SSL traffic is terminated here and the separate subnets give us the ability in the future to add in more security apparatus between the ALB and the Application Server in the future if needed.
The ALB also allows us to implement other services on an as needed basis using separate infrastructure and make it (reasonably) seamless to the end user by managing the paths on the ALB. This creates a single endpoint to the whole service.
This sort of segmentation is reasonably common in high security environments, first appearing in a reference QinetiQ high security architecture during the original dotCom bubble and very familiar to everyone involved in the project.
The application layer is where we have the largest number of changes. Its still a WordPress instance, however it exists in an auto-scaling group with a Min/Max of 1. This means should an AZ go boom, or something else happens to the server (based on a health check we establish on the ALB) a new instance is spun up to replace it within minutes. The maximum can easily be increased as and when the site becomes more and more popular
Because of this, the instance has to in effect become ‘stateless’ therefore the static content such as the stories would be written to an Elastic File Store that is then mounted on the instance in the event of failure. An embedded script detects which AZ the instance is in via the metadata service and mounts the relevant EFS share via the local subnet endpoint.
In the event of ‘manual’ access needed to the server, a bastion host would be spun up in the public subnet that could be SSH’ed into, and then from there into the instance. An AMI will be built that holds all the default software and configurations, when a software update is needed, we spin the AMI up separately, patch it, image it and then do a blue/green replacement with the existing one (a packer configuration is in the works to make this even easier).
Each instance when launched has a role that grants read access to the static content S3 bucket (mainly used for large media files) and write access to Cloudwatch — and nothing more!
The data tier is still MySQL, however its now running in Amazon’s ‘serverless’ database Aurora. This doesn’t have much of a functional change as all the clever stuff is handled behind the scenes. Essentially this removes the cost of having a ‘standby’ database ready for a fail-over as the data is distributed between all the AZ’s, should it go bang it will take ~15mins to be automatically reconstructed and the DNS records updated. Not ideal but a lot better than the current situation where everything is on the same platform.
Security groups will be setup so only the instances can talk to the database and vice versa. If something more direct needs to be done to the database then a similar situation to accessing the application instances can be used. Likewise the role used by the instance will be setup with the minimum needed policies to talk to aurora. Passwords and similar will be held in SSM/Parameter Store wherever possible (yes Secrets Manager is available, however costs, whereas Parameter Store is free).
Backups will be automatic and based around snapshots on a desired schedule with life-cycle rules getting rid of older ones.
One additional component is the private endpoints for S3, Amazon best practise says that this is really to reduce the cost of data transfer from AWS, however they also have the ability to attach policies and further restrict what can be done should any instance be compromised.
Finally, we have Cloudfront, which initially was a bit of a surprise to me, but provides a massive cost reduction by caching the static content (of which there is a lot) and reducing the load on the servers and the data transfer of the account.
Total monthly cost of the solution is around £85.
We’re still in the final stages of migration, a few issues were found (we were initially aiming for Aurora Serverless, but MySQL was the wrong version), but for the most part it’s been reasonably straightforward.
The one area where we did run into problems was around security of PHP and WordPress itself, there are an awful lot of ‘paid for’ plugins that will say they will secure WordPress for you — but won’t explain how, which makes me extremely suspicious. Anyone with any recommendations, please tell me in the comments.
Next steps will be to implement tuned monitoring and alerting and some custom Lambda functions (instance spun up in the public subnet — text message to everyone for example), which will be made available once I have them completely debugged.
And then, in our copious free time — press go on the migration!