WordPress on AWS – a review

This is a follow-up to WordPress on AWS to see how well the design meets AWS’s Well-Architected guidelines and how well it has performed in its first few months.  For context, here is the AWS configuration.  WordPress itself is housed on the  EC2 instances in PHP on Apache.

The summary is:

Overall, it has functioned adequately – the site works and is performant, and the admin stuff works.

Requires minimal maintenance – 5 minutes daily to glance at the dashboard, two manual terminations (with auto-restart) so far.

Repels about 100 low-grade attacks per day.

There were three outages – both due to excessive DB connection for no apparent cause.  I was alerted by SMS in 10 minutes and restored service in 5 minutes.

Two more security defects – one was hard to patch.

Upgrades are dangerous – don’t do them on a production site.

In conclusion, this design is fit for purpose, but that purpose is modest blogs or websites.  The limitation is security – the risk of fraudulent access and theft of personal data is higher than best practice.  I hope we still care about that.

Does this meet the architectural objectives?

I’m going to frame this with the five pillars of the AWS Well-architected guidelines.

Obviously, I’ll do this in the context of an IT system supporting a small/medium enterprise  – it’s not air traffic control.

Operational excellence pillar – 4 out of 5

This architectural pillar went well, and the objectives were either met or did not apply.

Automation:  Deploying the EC2 instances that run the WordPress Apaches is fully automated.  The Launch templates define every activity needed to create the VM, load the base software and then load WP and its customization.   However, I did not automate the configuration of the other eight AWS services (using CloudFormation) .  I would do that if I had to stand up another instance, but that’s not happening soon.

Test offline:  The Launch template lets me make, test and deploy small changes (e.g. change PHP version but keep OS), and template versioning allows them to be rolled back easily.

Test operational functions.  I tested all the operations functions – killing a service, rolling updates, DB failover and restore, filesystem backup and restore and all worked well with no operational impact.

The big operational load is upgrading WP and its plugins.  The blogosphere and my experience are that these upgrades frequently fail.  WP has an ‘auto-update’ option, but that seems reckless.  So my process for quarterly updates is:

  1. Create a new test EC2 instance, but run it on a copy (not replica) of the DB and file system (that’s easy in AWS).
  2. Update the test instance with the new WP, then load the latest plugins.  This will update the DB and file system copies.  Then test that.
  3. Do a DB and file system backup on the production system, then update the production system.
  4. Destroy the test system.

That’s probably half a day’s work if it does well, but history shows updating WP, a theme and ten plugins won’t go well. 

Security pillar – 2 out of 5

This is the weakest pillar, and not through want of trying, and through no fault of AWS.  Starting with the good bits, my deployment has:

  • Strong front end – It’s TLS 1.2+, with strong cyphers.  The WAF independently rate limits attacks to the whole system and to password entry.  Firewalling protects the VM instances and the database from Internet access.
  • No DB access – The DB has no human access, and no DB passwords exist in configuration files.  Normally WP requires the DB password in the wp-config file, however, I replaced that with a lookup into the AWS Secrets manager.
  • Reduced Leakage – I plugged many information leaks – software versions, directory searching, reading configuration files…

However, as someone familiar with building secure systems, I’m very uncomfortable with this.   The core problems are:

  • WP (together with PHP) builds in very few security functions.  WP core has low security at many levels, and you have to plug those holes either with your hosting or plugins.  To name a few – no 2FA, no effective logging, no TLS, mixes data and programs on disk, leaks version critical data, unencrypted personal data at rest, passwords and keys on unprotected files, no rate control, cannot limit DB connection usage, uses MD5 hashes, has poor HTTP security headers…
  • This class of problem also applies to the dozen or so plugins and the theme you will need to run your site.  Some of these are competent products, many are not, and some are criminal fronts.
  • I read the WP documentation and a few dozen blogs to understand what is needed to secure WP.  I feel this information does not exist – some does, but not all.  As a result, I have patched many holes, but I have no confidence that I have patched enough.  In 2023, with cybersecurity causing such havoc, I should not have to patch any.

Given the effort, can you secure the site enough to handle personal data?  Maybe yes, for a small business with a few hundred customers or the emails of blog subscribers, if you are careful with how you build and maintain it.  If not, WordPress is insecure for any purpose.

Is it OK for e-commerce?  PCI-DSS is the standard for securing credit card data.  Given the lack of rigour in WP or Plugins, meeting PCI-DSS Requirements 1, 2, 3, 5, 6 and 10 seems impossible.  This is widely recognized, so the standard approach is a plugin to a payment gateway, such as PayPal, Stripe, or WooComerce (WordPress promotes WooComerce).  The payment gateway takes much of the responsibility for PCI-DSS, but not all.  WooComerce makes this clear in their documentation, which has a lot of side-stepping around their responsibility and many weasel words.  After reading the WooComerce document, I cannot clearly say:

  • exactly what my responsibility is as the WP site owner,
  • and how I could do some of the tasks they suggest that I do.

Directly using PCI-compliant payment gateway products in the way they specify (such as PayPal as described here) would technically meet your PCI-DSS obligations to keep credit card data secure.  That’s all PCI-DSS cares about – card security.  Your site can still hit horrendous issues, stolen cards, friendly fraud, session theft, account takeover, value fraud, carding…  In the first four, you lose the goods and the value of the transaction, and in the last, you lose a lot of fees.  A good payment provider will mitigate these to some extent, but clever attacks use many sites to keep ‘under the radar’ – so if the fraud is only a quarter of your sales or a hundred transactions per day, the criminal will probably escape unnoticed, and you get to pay the chargebacks.  To be fair, all e-commerce sites have this problem.  But compared to a dedicated shopping site like Etsy or Shopify, with WordPress, it’s harder to mitigate attacks, and you’ll be on your own, probably sobered by hard experience.

WordPress has the aura, reinforced by the vendors,  that it’s easy for amateurs to use to build their websites, and about 100 million have done so.   Today, an amateur building a WordPress site is building a soft target for criminals.  So my conclusion is no, I would not run e-commerce from WordPress nor recommend businesses to do so.

Reliability pillar – 4 out of 5

This went well, though there are cracks.  Reliability is always tough to quantify, and long-term operational experience tends to show we overestimate reliability because we don’t account for all the failure modes.  So let’s go through some of the more obvious measures:

No single point of failure.  Two AZs with concurrent processing, and redundancy of all functions – network, compute, database, and file storage.

Automated recovery.  The load balancer deals out faulty instances within a few minutes, and the auto-scaler replaces them within 10 minutes.

Scales horizontally.  Dynamically adds compute to match the load.

Change through automation.  Automated deployment of new software after offline testing.

Rollack.  Automated continuous backup of database and file system.

Tested.  All failures I thought of were tested, and the ones I did not think of are added when they occurred.

WordPress hosting is a good example of the “When something breaks, operations can identify and fix it Fallacy.   With custom software, the developers usually understand what’s going on, with COTS software, you have recourse to the vendor, with WordPress, you have the blogosphere.  And that’s not a great answer when your site is down. 

This is the dashboard that I constructed with CloudWatch, showing the site serving around one page per second.  It brings together 19 metrics from each part of the system to see the system’s health at a glance.

I had two outages that did not recover themselves – in both cases, one PHP process maxed out its CPU and the connections at the DB server, preventing the other AZ from accessing the DB.   However, the canary monitoring detected the problem and SMS’ed me.  I terminated the high CPU instance (readily seen on the dashboard), so service was restored.  The autoscaler then started a new EC2 instance to take its place.   To mitigate this, I set the per-process connection limit at the PHP level at half the DB server connection limit.  Sadly this did not work, and there still is a once-per-month outage.

Performance efficiency pillar – 3 out of 5

This pillar is less about speed than speed-per-resources consumed.  This design used managed DB, FileSystem and operational tooling  (rather than build-it-yourself), which improves performance per development dollar.  The gaping hole is that it does not use serverless DB or computing, so it has poor utilization for low-usage sites.

Since 2020, AWS Lambda has supported EFS, and since Q4 2022, EFS has supported Elastic Throughput.  Together these should remove the former barrier to serverless WordPress.  Coupled with Aurora Serverless MySQL, this should give a fully serverless WordPress with scale-to-zero pricing.  I can’t wait to try it!

Cost optimization pillar – 3 out of 5

I spent a bit of time using the AWS cost analysis tools and, as a result, halved my initial costs.  This is my current daily cost breakdown for a few hundred daily visits.  That’s USD $3.50 per day, about the same as business-class hosting from dedicated WordPress sites.

The costs do scale up with the traffic but in chunks of an EC2 instance.  One Ec2 instance serves about 100 views per second.  However, the three big cost items – RDS, EC2 and VPC do not scale down with reduced load.  If I moved to serverless,  the RDS and EC2 costs would drop substantially, effectively halving the operating costs.  The VPC cost is due to my use of Session Manager – a steep price, but for the security and ease of key management, it is well worth it to me.

Sustainability pillar – 4 out of 5

According to Website Carbon, this site produces 0.1g of CO2 per page view – better than 90% of websites.  It is because of the low volume of delivered content – about 400kB, compared to the 2.2MB average.  The price of all those images and videos is 4% of the global CO2 production – the same as the airline industry.  Complex content costs energy on the server side, in transmission and on the devices that render it.

On the server side, the idle power consumption of my site is about 1.5W, and each page rendering takes 2J, which equals 25 milliWatts at 1000 page views per day.  This is a horrendous inefficiency – only 2% of the power is used for the purpose.  This is largely solved by moving to serverless.  I also wrote a JVM (Kotlin) version of the page serving code, which was six times more efficient than the PHP version.

From a global view, an average WordPress hosting produces 240 times the CO2 of a best-practice solution.  Given there are 100M active WP sites, and the average page view rate is 30 per day, this is serious.  WordPress emits about 400,000 tons of CO2 annually, of which only 2000 tons is page viewing.  This is not good.

Conclusion

The hosting meets all the AWS Well-Architected pillars, except for security.   That deficit is down to the fundamental design of WordPress, making it really hard to bolt on extra security in a way that would give me the confidence to store a significant amount of personal data (say more than a few hundred records) or to take any payments.

I should build a Serverless deployment – recent AWS uplift seem to make that possible, and serverless is the way of the future for low traffic loads.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top