Securing Amazon Web Services — Part One

9 min readAug 15, 2018

Over this weekend I noticed a new Amazon Web Services (AWS) video, focusing specifically on defense, this hasn’t really come as much of a surprise but sent it on to a few chaps doing some innovative stuff in the aggressive camping specialists known as the British Army

My known (the criminal part is implied by those who call us such) associate Combat Boot didn’t let me down

There is a reasonable point to be made here, and I much prefer convincing those with a skeptical mindset over those whose jobs either rely on A THING, or are true believers in A THING.

So how does one go about building a secure AWS environment? What follows is my starter for 10.

Shared Responsibility

At the core of AWS security is the shared responsibility model, which essentially amounts to Amazon being responsible for the security of the cloud, whereas the customer is responsible for security within the cloud, more details can be found here -

Shared Responsibility Model - Amazon Web Services (AWS)

While AWS manages security of the cloud, security in the cloud is the responsibility of the customer. Learn more about…

aws.amazon.com

Essentially you can build a totally insecure environment within AWS, and many have, because it might be what you want to do. End of the day AWS, like most Cloud Solution Providers (CSP’s) provides a toolbox for you to build what best fits your requirements.

If those requirements involve allowing cryptominers to spin up expensive GPU powered rigs on your credit card — well who are they to say otherwise!

You probably don’t want that though.

Terms

Public Clouds provide a very different model for consuming resources than the traditional data centre, and private clouds especially are vulnerable to becoming little more than virtual machine farms, so to begin, here are a few general concepts Amazon use:

Art class was always a bit of a weak subject

Regions

AWS is split into multiple geographic regions, that never cross country borders for legal and data sovereignty issues. Some countries have multiple regions (US and China especially), in Europe we have 4 — UK(London, EU-WEST-2), Ireland(Dublin, EU-WEST-1), France (Paris, EU-WEST-3) and Germany (Frankfurt, EU-CENTRAL-1).

Although within an AWS account you can access just about any region, your data will not leave that region unless you explicitly choose to copy it out of that region.

More details can be found here —

AWS Regions and Endpoints - Amazon Web Services

See the regions and endpoints available for AWS services.

docs.aws.amazon.com

Availability Zones

The Availability Zone(AZ) is the core of the AWS deployment model and what your resources reside in. One thing Amazon make clear is that you can loose an AZ at any time, and therefore for a truly resilient and redundant architecture, it must be built across multiple AZ’s.

A region will have a minimum of two AZ’s and can have up to six

Data Centre

As with the Data Centres we are all used to. Each contains between 50–80,000 physical servers, networking kit, power supply and so forth. Each is built on an independent flood plane, has separate power supplies and internet connections and within >2ms latency of each other (often a tenth of that).

They are never allowed to be larger than 80,000 servers (more DC’s and then AZ’s are likely to be built) in order to minimise the ‘blast radius’ of any failure

EC2-Classic

Elastic Compute Cloud (EC2) Classic is something most people will not see, it’s the original version of Amazon’s compute cloud that launched a server (instance) onto a public ‘shared’ network. It is no longer available unless you have a very old account.

Virtual Private Cloud

The Virtual Private Cloud (VPC) is Amazon’s latest generation tool for segmenting resources in the Amazon cloud. It provides your own software segmented data centre with access to all the AWS services.

It's worth noting that several services operate outside the VPC such as S3 and DynamoDB and so need special focus if being utilised.

Subnet

The classic network subnet, in which resources are launched. Subnets cannot span AZ’s

One trick that can be used with subnets is Public/Private — IF you need any kind of public presence at all, as seen below

In this very contrived architecture we have separate data subnets where the Relational Database Service (RDS) can launch instances into, the routing for these subnets along with Network Access Control Lists (NACL’s) would be tuned to allow access only from the public subnets (in lime) where we launch EC2 instances running the application.

There is an additional component called the IGW or Internet Gateway, this is a component that must be associated with one or more subnets for anything to be accessed via the internet that is running in the AWS Cloud (there are also NAT gateways/Instances that can allow private subnets to route out as well), if you don’t want any components within the VPC accessing the internet, blocking the use of this is a good first step.

We also have an Auto Scaling Group (ASG)

Autoscaling

Autoscaling is what makes a Cloud more than just a collection of virtual machines. Autoscaling is a highly flexible background process that either scales up or down the EC2 (and now many other) resources based on a series of predefined rules ranging from CPU utilization, concurrency thresholds, custom definitions & rules or just a command “Keep X number of instances running in Y AZ’s”

If you don’t have autoscaling in a Cloud platform, then you don’t have a Cloud platform.

Application Programming Interface (API)

API’s are the technology that underpins everything in AWS, using a technology called REpresentational State Transfer (REST), the best way to think of REST API’s is that they are web pages structured to be easily understandable to machines, data can be inserted into systems similar to how you post a tweet, or read as you would a webpage.

Identity Within AWS

Amazon provide Identity and Access Management, a way to create users, groups, password policies, permissions and so forth to control what individuals do within AWS with two distinct models.

Programmatic — Where all interaction with AWS occurs via the command line, some kind of custom program or Infrastructure as Code program such as Terraform.io.

Console — The traditional ‘GUI’ access, where someone can log into the AWS console and start interacting with resources.

There are also several services to link AWS back to an on-premises Active Directory farm to authenticate users, or indeed extend AD out into the Cloud.

These are all ways we have been doing Identity management since the dawn of computing, and has lots and lots of holes, including the eternal favorite of -

SMBC hitting the nail on the head with a piledriver!

Its long since past time we took a look at how we manage identity, ideally by burning legacy to the ground!

Now hold that thought, we will return momentarily

When a user is created within AWS, they have no access rights at all, rather they must be assigned to a group with an attached policy or assigned a policy directly. The policy is a JSON formatted file that explicitly describes what actions the user can undertake.

Polices come in three forms

AWS Managed Policies

Here Amazon maintains a series of generic policies, updated regularly to provide access to new services and capabilities. For anyone building a secure environment I do not recommend using these directly, but rather as a basis for…

Customer Managed Polices

With these you are responsible for the maintenance of the policy, this provides the benefit that you and not Amazon control who can access what, with the downside that you will likely need to make frequent updates to them as and when the services expand.

I recommend using the Managed Policies as a template for any Customer policies.

Inline Policies

Inline are a special case, they are directly applied to an individual user to elevate their permissions to perform certain actions, and when that user is deleted, so is the policy.

From a maintenance and audit capability, I suggest minimising the use of them.

Here is an (incredibly contrived) example policy

With this policy, I’m allowing full access to EC2, but with several conditions, first being that whoever logging in to the account with this policy attached must be using Multi-Factor Authentication, second they must be doing this work between August and December 2018 and the actions must occur within the London region.

I then add in an additional statement that denies all activity if they are connecting from specific IP addresses.

A policy is evaluated in three stages, first when a call is made, is it ‘Implicitly Denied’ as anything not permitted is denied, second is it allowed, such as shown here with the EC2 access, and finally is it ‘Explicitly Denied’ an explicit denial overrides any allow action.

Policies can also be attached to Roles, which were intended to be assumed by Instances so they could access AWS resources without access keys being deployed onto them, however they can also be assumed by users, and here is where things can be done slightly differently…

Let's take a look at the Security Token Service (STS)

STS is a Token Vending Machine that provides time limited access to an AWS account, either the console or access keys for automation/API/command line interaction.

These tokens can be generated from users accessing another account (cross account access — useful for some use cases), Web Identity Federation (where a user logs in to Google/Facebook/Github etc and then accesses the account — not ideal for a high security environment to say the least) or via a Security Assertion Markup Language (SAML) token.

This latter option allows us to carry out an authentication via an on-premises Identity Provider (IdP) such as Active Directory. Using normal Active Directory groups, user management tools, existing passwords and processes and procedures. The user wanting to access the account goes via an on-premises portal where they are authenticated (along with an audit trail), and the portal then generates the SAML token (encoding the users unique ID ideally to simplify the audit trail), provides that information to STS, which then grants access to the account.

In addition its possible to add additional policies to the SAML assertion to further restrict access, these additional polices cannot override those attached to the role, but they can be used to add additional restrictions on a per user basis.

In addition to using existing infrastructure and processes, it provides the benefit of there being no users in the account, and therefore forces any attacker to go via your internal system, or hack Amazon itself — both of which are likely to be ‘challenging’.

Further details can be found here -

AssumeRoleWithSAML - AWS Security Token Service

Returns a set of temporary security credentials for users who have been authenticated via a SAML authentication…

docs.aws.amazon.com

This approach has become increasingly common for organisations needing heightened security as it avoids extending any identity provider into AWS. It is assumed that your existing infrastructure is sufficiently sized, scaled, resilient and redundant.

The roles that you have configured in your account will be unique to the requirements of a particular organisation, ideally following the Principle of Least Privilege where the user only has the least amount of permissions needed to perform their function. This has the potential to create an overhead if you have a number of additional accounts, and in later articles we will look at AWS Organisations that can help minimise the amount of work needed to update the estate, and in addition creates a single unified “consolidated” bill across all accounts in your organisation that can also be useful for identifying when something isn’t quite right.

Up Next

In Part Two we will discuss Amazon’s auditing/control plane ‘Cloudtrail’ which records the API calls made within the account, and how to configure this so that the records cannot be altered, as well as the encryption service ‘KMS’.

In later articles (in my copious free time…), we will also look at additional auditing mechanisms, network configuration and gotchas, Cloud Access Security Brokers (CASB) and control matrices.