Architecture - Tactful Cloud

Lessons learned around multi-account architecture

This is post 9 of 9 in the series “Multi-account Architecture”

This is post 9 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

Wear a helmet

No matter how prepared you are to start this process, you’re going to take some lumps along the way. No amount of technical skill or understanding of the provider services will prevent you from going deep in a direction only to complete it just before the release of a service enhancement that may have done all of the work for you. Fortunately, service enhancements or additional features that might be responsible for something like this happening have significantly slowed down as the providers mature. This is why we stated earlier on that it is important where possible to use the provider recommended solution over a different solution. Wait if you can. It is likely something on the roadmap.

There will be times when you hit an un- (or not so well) documented limitation that will require your organization to get creative to meet requirements. This is almost completely unavoidable. The important thing is to stay flexible, design with the future in mind, document as described in a previous post, and automate as much as possible.

Think of this entire process with regard to software development. The deployment model for your architecture is essentially backed by code. Being ‘agile’ and leveraging brief release cycles and make adjustments over time. Rather you’ve landed in a single cloud provider or are spread out in hybrid-cloud, think about the ‘software development lifecycle‘ and treat your architecture the same way. Every few years or even months there will come a time to re-evaluate and re-deploy services leveraging the latest and greatest services and standards based on current industry best practice.

Start small

Consider compartmentalizing some of the most basic functionality. It is possible to de-couple too much creating more work for yourself, but it is much easier to add in than remove. It is important to iterate slowly over time. Or if you need to move a little bit faster, include many small calculated changes in your infrastructure that don’t create a whole bunch of dependencies if you don’t have to.

Automation enables small iterative change to many solutions at scale. If your deployment process is part of a CICD pipeline you have the ability to easily make multiple changes a day and rollback to previous configurations if needed at any time.

Working in small iterative chunks allows you to better keep an eye out for potentially destructive operations. In many situations, especially with AWS, certain modifications of resources actually result in a total replacement of the resource changing unique identifiers and/or breaking dependencies for other deployed resources. More often than not these operations will fail as a result of being dependent or depended upon but this is why you should be very considerate of everything you deploy, in what order, and with what other resources.

Test everything

In the solid foundation that you should have already laid out to this point, there should have been included a test environment. This is the account where you keep a working copy of everything that is deployed into your mission-critical accounts. If you can keep an up-to-date version of every mass-deployed resource in this account, you should be able to avoid major mishaps when it comes time to deploy and terminate solutions across your MAA. This account is critical for testing the removal of resources and creating procedures or scripts that will help you clean up or recover from a situation where you didn’t have your resources as de-coupled as you had hoped.

Plan for scale

Rather you are shooting for 4 accounts or 100, it is important to design solutions with reuse in mind. This means parameterizing as much as possible. Hardcoding names, identifiers, and other unique variables will lead to additional and unnecessary work.

Don’t build something yourself if there is already a solution available. This is important to consider for smaller organizations thinking they are going to save a few bucks. Your custom solutions most likely won’t scale the way you need when growth starts happening organically. Once your foundation is laid and systems in place, you will be surprised how quickly your architecture will scale. Maintaining and updating home-bake solutions to meet your architecture needs will quickly become very cumbersome if not impossible.

Expect the unexpected

Regardless of the CSP you go with, there will be nuance upon nuance with how the provider operates. Things you couldn’t possibly have planned for or thought of will creep up causing you to reevaluate your entire deployment model. Expect the unexpected. Treat your architecture like software and create newer versions of it over time to correct for the things you didn’t realize at the time were going to be an issue.

Feedback and flexibility

Communicate openly and honestly about the pros and cons of a particular deployment decision with all stakeholders. It is likely the person responsible for the creation of a new process or solution isn’t seeing everything from all angles. No single developer or operator in your organization can be responsible for accounting for every gotchya.

Everyone has blindspots. If you are the one responsible for the deployment of a new process or solution that is supposed to optimize ‘X’, be ready for your team to look at your solution critically and provide feedback or criticism. It is everyone’s responsibility to do what is in the best interest of the organization. With regard to cloud technology, there are at least a half dozen ways to accomplish anything. Be flexible and accepting of their feedback and do not take it personally.

Wrapping up

In this series, we’ve covered a whole lot. Everything from why an organization would even want to bother with all the extra work of establishing a multi-account cloud architecture, what one would look like, when the right time to implement is, how to best support MAA at scale, and now some of the lessons learned.

We sincerely hope you’ve enjoyed this series and we look forward to your feedback. We would also appreciate you sharing your opinions and stories regarding your current or projected future cloud architecture.

AWS Control Tower v. Landing Zones

As we were preparing this series, it was requested we provide a comparison between two AWS native service offerings designed to help streamline and manage this entire process. We have that in the works currently so check back often for that post.

Content & Resources

We are also in the process of polishing up an online course that better details all of the steps for creating the AWS Organizations foundation for a successful multi-account architecture that we have discussed in this series.

Series parts are as follows:

Support and maintenance of multi-account architecture

This is post 8 of 9 in the series “Multi-account Architecture”

Multiple Cloud Account Architecture (Series)
My experience with Multi-Account Architecture
Multi-Account End-State
When to consider a multi-account architecture?
When is it right to implement a multi-account architecture
What a multi-account architecture might look like
How to accomplish a successful multi-account architecture
Support and maintenance of multi-account architecture
Lessons learned around multi-account architecture

This is post 8 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

All aboard!

This journey cannot begin until you have everyone on the same page. By this point in the process, the gears should have really started to turn. Your organization can see how a multi-account architecture can really benefit everyone. We’re talking from customer to developer, operations and security, finance as well as stakeholders.

However, ideas can start to run rampant. Egos can go unchecked. Having a solid foundation with extreme flexibility can lead to a lot of carelessness. Now is more important than ever to make sure everyone is following procedure in order to properly support and maintain this new large-scale architecture.

From my experience, working with enterprise networks and leading teams that manage large-scale architecture, success can be distilled down to these 4 components:

Baseline
Documentation
Collaboration
Automation

Out of the box, this all just looks like work. A lot of work. Work that no one wants to take on. It will take time, but this is the under-the-hood work that needs to be done to make your architecture run well for the long haul.

Baseline

Having a defined starting point is critical. There could be many baselines within a fine-tuned architecture. A baseline in this context is really a defined set of instructions, values, conditions, and resources that will be used to deploy a new cloud account.

An organization could maintain a different baseline for the deployment of different types of accounts. For example, a different set of instructions for a department-specific account verses an application account. The baseline for what a development account looks like in comparison to a production account should definitely be different.

The key here with a baseline is they are fixed. Not that they cannot change and evolve over time, but once you’ve initiated the deployment of an account or process, the current baseline version will be followed through until completion. Organizations will evolve, grow, and prune their baselines over time. They are not set in stone.

Benefits of baselines

A well-defined baseline helps everyone within the team or organization know and understand the definite starting point for a project or resource. Baselines also allow developers and operators to set expectations on what is or should be present in every account allowing for automation and scalability. This four-letter acronym, RECD, helps describe the benefits.

Repeatability

When creating a baseline, deploying the exact same thing every time inherently becomes the norm. This allows solutions that are developed for a single account to scale to many. With the concept of repeatability, you have a cookie-cutter deployment model where operations know exactly how to deploy or maintain an account so developers can find or use whatever they need in the exact same place no matter what account they are in.

Evolution

Because architecture baselines should be deployed with coded templates, your organization will have a documented history of how the environment has evolved over time. This is important to help the organization understand where they may have gained or lost efficiencies and be able to adjust accordingly in the next iterations. If a key component to your baseline is removed because it may be viewed as no longer necessary, you will have a historical record to reference if that component ever needs to be reintroduced.

Compliance

It’s hard to imagine a space any more where there isn’t some level of regulation dictating what or how you should be doing something. Even if it is only from your internal security team, baselines allow operations to better provide compliance information as well as maintain a more secure architecture. Critical security configurations should be near-identical across all accounts as a result of being deployed with a baseline. As stated earlier, having confidence that something is configured in a specific way in a specific location is paramount to being able to gather compliance information as well as make bulk modifications.

Destruction

Knowing that the steps to recreate something identically if need significantly reduces re-work. Having a baseline means you can (in most cases) completely tear-down something that seems to be causing issues. Maybe someone made a configuration change out-of-band. Being able to recreate and environment the exact way it’s supposed to be in short order increases the ability to simply cut-bait when something isn’t working as it should.

It’s cyclical

There are many benefits to having baselines with regards to your environment. We really can’t come up with a scenario where not having a baseline would be a net positive. Even baselining a development sandbox enables cost reduction and more secure testing practices.

Creating baselines, however, does come at a cost. Documentation is critical to maintaining robust baselines.

Documentation

It should come as no surprise that in order to support and maintain a large-scale multi-account cloud architecture requires documentation. But it might not be the type of documentation you are thinking. Granted the skill level, experience, and tenure of your operators and developers will at times determine how detailed your documentation needs to be.

Because of the vast majority of your architecture and deployment of that architecture should be in coded templates, there is no need to create seemingly endless numbers of detailed SOPs containing hundreds of steps and dozens of screenshots. Your documentation for this type of architecture should be very basic and enable autonomy. The templates and scripts should be well commented on. The logical and agreed-upon naming convention goes a long way. Lastly, less documentation of higher caliber enables more reuse. If a significant amount of context is required to understand a process, your repeatability is reduced greatly.

Naming convention & standards

Determining upfront what the labels, names, and variables for resources and processes within your environment should look like is critical to maintenance success. Having a system for naming ‘things’ is not only important for information gathering but it is a necessity in order to make mass modifications at scale.

Have documentation that lays out what names of resources should look like. Agree on what Snake case, Camel case, Kebab, and many others should look like for each category. Use different standards for different resource types.

Avoid redundancy. There are some (very few) cases where (especially in AWS) you need to include the type of the resource in the name of the resource. This is especially true when the people caring the most about the resource works mainly in the console. The is no need to name your Development VPC – DevVPC.

Layout the key labels required. What are the minimum pieces of information a resource should have to tell an admin, developer, or operator what the resource if for, what level of protection it needs, and how long it should be allowed to live?

Procedural

Mapping out the steps to deploy a solution template does not need to include every input for every field. In fact, it shouldn’t include any. Your steps should be type-o proof. We’ll discuss in a different series about automation. Anyone present in your environment after 6 months of spinup time should be able to glean the information required to conduct any process with the vaguest of steps.

Any process should have the minimum amount of detail required to successfully conduct. Only add clarification and additional details where required AFTER a step in the process has failed or been misinterpreted by more than one individual. This is an indication that more clarity may be required. However, do not add this information to the master procedure. When possible, abstract additional detailed steps to a separate document and reference that document from the primary instructions.

The key to solid documentation for maintaining a robust baseline and deploying secure and manageable architecture is simplicity. Maintaining up-to-date documentation can be very difficult. The information may be continually changing. Get as close to the process without introducing confusion and misinformation. No information is much better than incorrect information. Especially with regards to deploying critical infrastructure.

Location of information

Documentation that describes mindset, mentality, and the logic behind an organization’s process is more important in most cases than the information itself. Create resources that enable autonomy and decision making at the lowest level. When there are already agreed-upon naming conventions and procedural understandings within the organization, support and maintenance move faster.

Share instructions on how to use the organizational standards to get the information that is required. Documentation for the sake of having a process documented is worthless if no one is going to follow the steps. Document only the situations where manual intervention is required and could most likely cause a discrepancy. This would be ‘custom settings’, one-off configurations, or dependencies. The bulk of your documentation should represent where and how to find the required information when it is needed – not demonstrate the precise steps required. Parameterize and use variables in your documentation as much as possible to scale with your architecture.

Collaboration

Communicate between teams the standards, conventions, and procedures frequently. At a reasonable frequency for your organization, get together and discuss what is/not working for all stakeholders.

Any opportunity to reduce the back and forth between developers, administrations and operators will significantly impact the supportability and maintainability for the positive. If teams become annoyed or upset about something, it is important to understand why and agree on corrective action before groups begin drifting from the standard or coming up with their own processes.

Collaboration enables team cohesion, builds comradery, and further reduces the requirement for over detailed documentation. The idea is to build a culture that supports large-scale architecture. Not one that sabotages it.

Automation

About as obvious as documentation is automation. If manual intervention is a requirement for many steps in an architecture deployment, one could imagine at some point in the scaling process, systems begin to break down.

Automate anything that you can that happens more than once. Automation is a form of documentation. The appropriate process for automating something should include commented code and the ability to logically follow a process in case that process were to ever break down.

Automation does so much more than shave seconds or minutes off the front end of a process. As we stated earlier, it is important to enable autonomy. Automation does this. The more cogs you have in a process, the more potential for things to go wrong or steps to get missed.

Even a simple process, that only takes a minute or two, conducted weekly could have a profound impact on the amount of rework it could cause if the process is completed incorrectly and deploys or configures a resource incorrectly. Remember, the naming convention is critical for the ability to make a mass modification at scale. If the individual responsible for the process does so incorrectly or misses the process entirely due to a busy schedule or absence, considerable harm to your architecture could result.

Automation doesn’t completely eliminate human error; however, it can significantly reduce it.

Tools, 3rd-party solutions, & systems

As a bonus or honorable mention, it is important to state that these four components for successfully supporting and maintaining MMA have some sort of tool, service, or system behind them.

How you conduct your automation and where you store your documentation is considerably less important than actually having these components as a solid part of your foundation. However, it would be impossible to support and manage this type of architecture without an array of different tools. This series and this specific post is not the place to share the many options available.

Each organization will also be very different than the next with regards to their preferred tools and systems for deployment. Budget, technical competency, time resources, etc. all factor into what should be considered a viable solution for each architecture.

Check back to Tactful Cloud frequently for updates and information on the tools and solutions that are available to meet your support and maintenance requirements.

Series parts are as follows:

How to accomplish a successful multi-account architecture

March 16, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 7 of 9 in the series “Multi-account Architecture”

Multiple Cloud Account Architecture (Series)
My experience with Multi-Account Architecture
Multi-Account End-State
When to consider a multi-account architecture?
When is it right to implement a multi-account architecture
What a multi-account architecture might look like
How to accomplish a successful multi-account architecture
Support and maintenance of multi-account architecture
Lessons learned around multi-account architecture

This is post 7 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

It’s all about the numbers

Once you’ve made it this far — you’ve laid a solid foundation with your organizational structure — it’s all about turning out ‘widgets.’ Widgets being new AWS accounts that is. What you’ve built up until now is just the starting point of your assembly line. Now you need to make it run like a fine-tuned system. From here on out, it shouldn’t matter if you are creating a new AWS account per week, month, quarter, or once per year. Your growth trajectory doesn’t matter with regards to the work up to this point. Again, you need a solid foundation to build off of, but the real work hasn’t even started yet. Once complete, 3 accounts or 100 accounts shouldn’t give any more or less heart-burn. That goes for Ops, Support, and Security teams.

Getting to the starting line

If you are coming at this from an already ‘well-established’ cloud environment, you might be thinking to yourself “how in the hell…” I don’t blame you. It can look like a really daunting task. It is. Unfortunately, a single blog post can’t help you that much.

If you have a clean slate and are just starting out, you are in a very good position. We will cover the steps for a successful MMA here, starting with your preexisting condition.

Somebody’s baby

No matter how you go about it, someone is likely to get their feelings hurt when you take on such an initiative of correcting your current state of affairs in your cloud architecture. The biggest thing to remember moving forward is — it got this way using the tools, services, talents, and knowledge available at the time. ‘Don’t throw the baby out with the bathwater.’

There is nothing wrong with how it is or how we got to where we are. It is just time to make some changes. Use this opportunity to learn and grow technically (whoever is going to die on the hill about not changing what they created because it is working). Not even a technologist with a crystal ball would have been able to predict the rapid rate of change and service offerings of any cloud provider. It’s pretty amazing you’ve done as well as you have.

If you are in this predicament, here are the abbreviated steps I recommend to get to a good starting point. It is important to realize that the following process will look very different for every organization. Use these steps as a guidepost for the direction you want to head and not as instruction etched in stone.

First, a gif to illustrate all the steps:

Take inventory of your current (legacy) architecture
- The current number of accounts
- Services represented in each account (AWS and Application Tiers)
Deploy a new account an promote it to the AWS Organization Master if you do not already have one
- Create OUs for your desired structure
- Enable inherited features/services you know you will use across all account
- Prepare Billing alerts and quotas
- Establish potential SCPs but do not deploy them over your OUs yet
Stand up your Orchestration and Administration tier
- create a security account for a logging and security functionality
- depending on your organization and needs, this may be multiple accounts
Deploy a research & development tier
- If you are developing in a different VPC but in the same account, now is the time to stop
- Create a sandbox (dirty) environment that has significantly more lax permissions for service so developers can freely test things without mucking up development
- Dev/Test/QA, depending on the number of accounts you want to manage, should be a 99% replica of your production environment
Here there is a fork in the road.
1. Establish a new production to be a clean slate.
  - This allows dev to be more in line with the previous bullet.
  - Hard to move enterprise tools/config can remain in place and move through attrition
2. Begin pulling out all other administration and enterprise tooling from your existing production account
  - Establish share/brokered services accounts
  - Deploy authentication and networking in their own managed accounts
  - This allows for better delegation and separation of responsibility as well as ability to more easily leverage new services and features released in the future
By here you’ve determined where production is going to live
- New account
- legacy account cleaned out
- Either way now is the opportunity to clean house and lockdown ingress/egress points. I know how hard this step can be, so if at all possible I recommend a clean environment for production
Transition your deployment of new releases to your new production environment
- If you’ve taken this opportunity to establish new roots for production, it’s time to cut over. Leave behind your legacy setup for roll-back
- If you’re living in your legacy production environment still, you’ve completed moving as much administration and enterprise resources as possible out of this account
Terminate legacy or wash – rinse – repeat.

That’s it. You’ve done it!

Well, most of it. We haven’t talked about how to deploy all the additional accounts you might want going forward.

If fact, some of that information could be very beneficial back around Step 3. Luckily for you, we’ll try to cover that in the next post.

Clean slate

If you’re just starting out, you don’t actually get to jump all the way down here. You need to go back and review what life could be like if you don’t properly organize from the get-go.

You see, you have a lot of benefits in being a little tardy to the party. But don’t count your chickens before they hatch. Hindsight is 20/20. Coming into this clean means you don’t know what you don’t know. You don’t have the ability to know how you would have done things differently until after you’ve already deployed a solution.

The upside is with the solid foundation established upfront, ‘back to the beginning’ can be as simple as hitting the delete key.

Just because you have a clean canvas for your architecture doesn’t mean you won’t still find yourself pinned in a corner as a result of some undocumented limitation or you didn’t totally account for the entire cost of implementing a solution.

Use the previous steps to whiteboard a few potential deployments. Ask yourself:

Who needs to access what services? And for what?
What security implications are there when putting to services together?
Which services will be brokered or shared?
Where will we fail-over to? Is DR in a different availability zone, region, or combination?

Playing the long game

Still, at this point, the work has barely even started. Going forward you will need the tools and resources to deploy rapidly on top of your foundation. While the battle starts here with the planning phase, the war is won by implementing sound processes, standardization, convention, and lots of rigor.

We cover that in the next post.

Series parts are as follows:

What a multi-account architecture might look like

March 9, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 6 of 9 in the series “Multi-account Architecture”

This is post 6 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

Choose your own adventure

There’s really no ‘one-size-fits-all’ solution when it comes to deploying accounts within your organizational structure. In our previous post about ‘end-state’, we discuss very briefly a few different types of accounts you’d want to include in your architecture and why they might be beneficial. We also shared this diagram:

In this diagram, we show at the top a pretty standard foundation for managing a cloud architecture of any size that should support nearly any scale.

The AWS Organizations structure works by implementing inheritance through the use of containers called Organizational Units. These containers can be nested and used to organize different accounts based on an infinite configuration scheme that best suits your needs. Once the management structure is determined, the organization of sub-accounts can take place.

At the top

In our ‘End-State’ post, we already discussed the key foundational accounts recommended for creating a robust MAA. The number of those types of accounts and for what service they provide should be determined by organizational needs and available resources. There can be a fine line between logical separation of services and total chaos.

At the very least you have your AWS Organizations account that is used to spawn subsequent accounts. In a later series or individual post, we will share how this process takes place. It is in this account you enable the sharing of other AWS Services across all accounts linked in the Organization. This is an important component.

Alongside your Organization account, an architecture should at the very least have a single account for security and enterprise services as well as an account for shared resources.

However ya wanna slice it

Once you’ve established the Root Organizations account and any other management accounts, you can begin placing the accounts into Organizational Units (OU) or ‘functionality buckets.’

These buckets allow you to apply the maximum level of permissions as well as available AWS Services allowed to be used in each account. Controlling the permissions allowed in an account is done through the use of Service Control Policies (SCPs). Too much to cover here; however, these policies allow you to completely disable services in all accounts within the OU they are placed.

One especially beneficial use-case for SCPs is implementing available services within AWS GovCloud Regions. For example, when working with the DOD and their Cloud Computing Security Requirements, you will learn that different AWS services are allowed for different information types or ‘Impact Levels’. However, there a not different types of AWS GovCloud account to accommodate the different impact levels. All services regardless of their accreditation for use are enabled in the GovCloud Regions. You can use OUs and SCP to disable the use of services that have not been vetted for use within a specific impact level. This will prevent even administrators from inadvertently leveraging a service within an environment that it has not been accredited for.

Below is an example of an Organizational Hierarchy for such a situation. Within our testing organization, we have 6 AWS accounts: Orgs/Master, Dev, Prod, Security, Management, and Release.

Prod and Release are in a higher impact level OU, while Dev, is in the lower to enable R&D. The Orchestration Accounts can be excluded from either to enable the use of shared services.

Be logical

The following images are taken directly from a presentation we’ve given on creating and managing via code a multi-account architecture. You can use them to spark ideas on the best way to structure your account hierarchy.

It is important to note that the shaded and unshaded boxes do not need to denote an actual Account or an OU. Depending on the situation, they could both be OUs where the lowest level OU contents all the accounts for that category or service.

Project

A sub OU structure can be created to organize all projects into their own Organizational Units where the first OU is the for the whole project, and all accounts can fall under that single OU, or additional sub-OUs can be containers for like account types.

Billing

Each initial level OU in your structure could be for all accounts that are related to the same cost-center or source of funding. Many organizations have different ‘colors of money’ where lumping like accounts into the same OU structure could enable the disabling of services. For now, you must still use tags to allocate the cost of all services across all accounts in a structure like this, but expect AWS to enhance AWS Organizations to accommodate more granular billing views in the future.

Isolation

If you’ve properly configured your networking and other application-wide services within shared services accounts, it might be logical to completely break out your OU structure to tier specific roles. Not only can you segregate database admins from the developers and their services, but you can also completely disable services that have nothing to do with hosting data from consuming cost in the respective accounts.

Failover & High-availability

This is a concept that we have not personally tested but it seems like, in the right circumstance, it could prove to be very beneficial. There are many different ways to implement blue/green or virtually zero down-time deployments in the cloud. One solution could be to dedicate full accounts to releases of your application or service.

This solution provides many benefits. Depending on your release cycle and how long a release is live, you could have unique billing information for every release. This would give you the ability to more easily see if new changes to your infrastructure, scaling, and/or code reduced the cost of resources or increased.

This solution also more easily gives you the ability to give weighted access to the new release for a small percentage of customers and increase over time. Since nothing at all should have changed in a completely separate account with regards to the development and deployment of new code, you can very easily revert back to the old release if something starts going wrong.

You get to decide

As you can see, there is virtually an infinite number of accounts and OU configurations with regard to the hierarchy of your multi-account architecture. Each business case could serve different needs.

The benefit of this flexibility increases the manageability and scalability of whatever solution you are working to deploy. However, you logically decide to layout your accounts doesn’t have to be concrete either. This can evolve over time. With the proper configuration of shared services, it may even be possible to introduce accounts or services from entirely different organizational structures. But that’s enough to cover for one post.

Series parts are as follows:

When is it right to implement a multi-account architecture

March 2, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 5 of 9 in the series “Multi-account Architecture”

Multiple Cloud Account Architecture (Series)
My experience with Multi-Account Architecture
Multi-Account End-State
When to consider a multi-account architecture?
When is it right to implement a multi-account architecture
What a multi-account architecture might look like
How to accomplish a successful multi-account architecture
Support and maintenance of multi-account architecture
Lessons learned around multi-account architecture

This is post 5 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

Not a moment too soon

If you are just starting in the cloud, then there is no better time than now.

If you are already in the cloud with a single account, there there is no better time than now.

The sooner you establish a solid foundation around MAA the better. The deeper you get into the configuration of your security tools (logging, scanning, patch deployment, etc.) the harder it will be to reconfigure these solutions to properly accommodate your new MAA trajectory. Security tools, however, might be the least of your concerns. Let’s go over just a few of the popular reasons to make this architectural decision.

Federation

The moment you have a one-to-many, many-to-one, or many-to-many relationship regarding roles and responsibilities in your AWS account, you need to stop and address MAA. Fine-grained access control is really a key moment in design or implementation where multi-account should be established. Managing credentials like IAM Access Tokens, as well as access to individual resources in the different tiers of your applications, can get very complicated within a single account. Can it be done? Certainly.

Where the difficulty lies is when certain individuals within your account have competing roles and responsibilities. Or when developers also have, but do not require, access to modifying the allocated resources individual services. Just as permissions within an enterprise network, where a particular role/responsibility is granted via membership of a specific security group, the federation of authentication allows you to configured access in a very similar matter.

Configuring federation through the use of Single Sign-on (SS0) solutions not only enhances ease of use for accessing your resources, but it more easily provides for fine-grained access to multiple accounts. By establishing roles through the use of federation you reduce the ability for the accidental assignments of permissions to an individual, minimize the possibility of someone having permissions that allow them to elevate themselves to a level of higher permission, and delegate access to the appropriate resources on an as-needed basis.

The moment you deploy another pair of access keys with a specific set of permissions to an individual or a group of developers, for instance, take a moment to think if this is really the right decision, or if their access should be controlled and managed by the membership of a group that can provide the same level of access to one or many cloud accounts.

Logging & Records

We’ve beaten the security drum pretty hard up to this point so this shouldn’t be a surprise. If your organization has to keep records for any period of time, you must consider MAA. Depending on the industry you may be in or the service you provide you may be required to hold, maintain, and archive access logs, modification records, and customer information of all types for a specific period of time. It’s hard to imagine any more a service that isn’t regulated in this way by some external body.

Creating a separate account for all of your audit, logging, and customer record data is a critical decision point. Maybe you hold HIPPA, PCI, Personally Identifiable Information (PII), or some other form of critical record. This information should have its own account designated for it. Not necessarily an account per data type, but for sure an account dedicated to data storage and protection.

By separating out this type of data from the more generalized application and usage data, you inherently protect it at a much higher degree. It takes intentional and knowledgeable configuration in order to provide access to such data from the tiers of your application deployed within a separate account. While this could increase the chances of misconfiguration due to complexity, good architects can accomplish this very easily. The risk is setting a configuration that is too open when something works and not locking access down to specific resources.

Aside from accidental resource access, you protect critical information from people who 1) does not need to even see it, and 2) could inadvertently manipulate it impacting the integrity of the information. This type of data needs significantly higher levels of protection. By placing it in an account of its own allows you to better manage the disposition schedule of the information more properly and more granularly configure retention policies.

It’s just easier when you know that an account is designated for critical information.

Baselining

Lastly, but of equal importance, is deploying a well-documented and baselined configuration. The ability to leverage multiple accounts for the deployment of identical environments is paramount. If your organization needs to test something or research a new feature, having a solid foundation for MAA greatly increases the repeatability and serviceability of your application. Testing new features or versions of your application or service from within a single account can leave a trail of undocumented resources and configurations. If you (even infrequently) deploy test resources or additional servers for short periods of time, consider MAA.

Baselining can more easily be accomplished when you have an account for dedicated, unchanged, shared, or brokered services. Having a baseline for new or temporary accounts allows you to control and manage, the services that are available that have been vetted for use as well as, through federation (mentioned earlier), control who can use such services and in what capacity.

Honorable mention

When else might you consider MAA? When you have different cost centers and accounting needs. The moment different ‘colors’ or types of money funnel into the funding stream for your service or application, it is time to break out the resources with the funding source. This will save you and your finance department a lot of time, effort, and heartache.

Series parts are as follows:

When to consider a multi-account architecture?

February 24, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 4 of 9 in the series “Multi-account Architecture”

This is post 4 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

This all seems crazy

So why even consider all the effort to implement a multi-account architecture in the first place?

In short… Because AWS said so.

Initially, when I heard this and learned about the release of AWS Organizations in 2018 and the subsequent release of AWS Control Tower in 2019, I just kept thinking that this is all just a way for AWS to make more money. Waaayy more money.

While it may be true that AWS does benefit financially from selling this mindset, in just the relatively short amount of time I’ve been doing enterprise cloud architecture I’ve come to learn that all of these new services are released for the benefit of the customer. What isn’t seen on the backend is all the planning and road-mapping for the ability to use some of the amazing features of the cloud, that in order to do so requires multiple accounts.

Leveraging a multi-account configuration has many benefits. Some of which we don’t even realize currently. As customers request the ability to do something else within their environments, new features will be released that lean heavily into a well-configured multi-account architecture.

What’s the strategy

There are many different reasons why a small business or large organization would want to and should implement a multi-account architecture similar to what I shared on the previous post. Here are just a few reasons but there could be many more based on your organization. One business case I won’t discuss in detail is acquisition potential or sell-off.

Landing Zones / Control Tower

As I stated right away, AWS says to do it. It’s really that simple. If you want to be able to be nimble and leverage all the great new features that AWS releases weekly, you really need to stay up on the best practices they put out. Those are put out for a reason and that reason is they know the direction they are taking all of their new services. It’s best to keep up.

Not only does segmenting your resources into different accounts provide the benefits of the points I will speak of next, but it also makes your life much easier to adapt to the rapid rate of change that being in the cloud exposes you to. Implementing new and beneficial features doesn’t have to always include a massive overhaul. You can use a phased approach.

Shared Services

The ability to more easily (and securely) share AWS services requires the AWS Organizational structure. When you start to leverage shared services, you have the ability to reuse work already done. You can even broker services to other departments within your organization without having to reinvent the process every time.

Shared services allow for better management by the responsible parties by centralizing access to a single account. This prevents the ability for someone to inadvertently cause much more harm than intended while bouncing around in the Console or using the Command-Line.

Security

It should be obvious by now that using multiple accounts significantly increases your security posture. As stated in the previous post, creating a security account allows for auditing best practices and preserving log integrity.

Having multiple accounts enables granular access restrictions. It helps prevent accidental access to protected resources. Developers should not be able to make changes or see instances that control enterprise services. Sure, IAM permissions can be configured to prevent accidental termination for instance, but why even bother having Domain Controllers intermingled with development instances if you don’t have to.

Billing

Early on in the cloud, tagging was really important if you wanted to separate out the cost for different functions or services. Tagging itself has evolved greatly over time but was never the most reliable way of breaking out the cost for reimbursement. If you wanted to know your total storage cost for your production application, you still had to somehow tag buckets used for the application to distinguish between storage consumed as related to an application vs. storage utilized operationally.

Utilizing different accounts within an organizational structure not only gives you the opportunity to have a better understanding of what your production application storage costs are, but you can further implement tagging within the account specific to the production release of your application to provide more context.

If you have multiple projects or divisions within your organization sharing compute or other resources within a single account, it becomes a mess very quickly if from a budgeting standpoint your IT department needs to recoupe operating costs from the HR and Accounting departments. If each department had its own account, this happens inherently further allowing each department to more easily tag their own resources if they wanted to provide more granular detail on their cloud expenses. Again, sharing resources and VPCs makes configuring department-specific accounts a breeze.

Isolation

Separating resources based off of organizational responsibility for billing purposes is one thing, but how about making sure the Sales team resources aren’t bumping into or gaining access to the Marketing team resources. One of the biggest caveats to accessing cloud resources is it is very easy to elevate your own access or find ways to access resources you should never have had access to. Understanding and managing IAM permissions (Policies, Roles, and Profiles) at a large scale is difficult. Everywhere you look in a policy you could find methods that provide some way privilege escalation.

Breaking resources out into their own accounts not only created added safeguards from and access standpoint, but it also creates a barrier to the creation of resources that should never have been launched and more importantly creates a barrier for accidentally running destructive operations across all account resources of a specific type. You’ve significantly reduced your blast radius for destructive operations. You may lose an entire departments resources at once but should never take down the whole organization.

Recovery

Last but not least in this our list of reasons why multi-account architecture is a good idea is recovery. Related to isolation, the recovery component means that when something goes awry in your organization, you should very easily be able to diagnose what went wrong. May the tech gods have mercy on the soul of someone who negligently terminated every EC2 instance in an account but at least it was only the one account. With other protections and proper configuration in place, that should never be able to happen. But what could happen is the bulk misconfiguration of some Security Groups, or the modification of an Instance Profile, that instead of affecting instances across all departments sharing the same account, only affected a single department and the rest of the teams may not even notice an issue.

If there is something askew with network access for all departments regarding your cloud architecture, there are only really two possible causes:

Your on-prem configuration to the cloud has been misconfigured and completely unrelated to your configuration and virtually unpreventable.
Someone that had access to your shared services account for Networking made a misconfiguration. Human error can happen.

If you notice that all resources within the Sale and Marketing teams seem to be working just fine and the only team having a problem is Accounting, then you have a good basis to believe you need to start investigating changes to the environment made in their account or to their specific systems. You should be able to immediately narrow down who had the access to make a particular service disruption change, in a specific segment of your architecture, and most importantly when.

If all your resources are shared in a single or only a few shared accounts, anyone at any time in the recent history could have made a change with no expectation it would have had the fallout that it did.

Not so crazy after all

You can see here with just these six reasons that the effort to get a well-architected multi-account configuration could save you boatloads of time in the future — in troubleshooting and outages alone.

All of these reasons relate primarily back to one key theme – security. AWS has many resources that can help you in the implementation, the structure, and the strategy for multiple account configuration. I encourage you to review the AWS Multi-Account Security Strategy to learn more and of course, continue reading on in this series.

Series parts are as follows:

Multi-Account End-State

February 17, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 3 of 9 in the series “Multi-account Architecture”

This is post 3 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

Looks more complicated than it is

If you’ve been in the cloud game for a while (say since 2015) and have all your resources crammed into a single account (or a couple), considering the following diagram in your road map could cause you to lose sleep.

Diagram 1: Example of a large multi-account configuration

If you’re just now considering moving your operations to the cloud or are basically just starting operations and want to be 100% cloud, this diagram may give you pause. You may think it will be more complicated than you expected. You don’t have the manpower to support this.

First, you may be right.
Second, it doesn’t actually need to be this crazy.

What you are supposed to get out of this diagram is multi-account architecture is flexible and scaleable. Depending on where you are currently, multi-account could me two accounts or twenty. Leverage this diagram to use your imagination and think of the different ways you would want your organization’s hierarchy to look. Because that is what this is — a hierarchy of accounts and inheritance of resources very similar to an active directory structure.

What is this mess?

I’ll only briefly explain the components of this diagram here as we will get into considerable more detail in future posts. Having this basic understanding will hopefully aid in generating thoughts of how you could best implement (or change) your multi-account architecture going forward.

AWS Organizations

‘Orgs’ is at the center of all of this. With AWS Organizations you can centrally manage billing, compliance, and control services used in all other sub-accounts. Without the use of Orgs, you will run into some issues with the ability to share resources (i.e. Networking) from one account into the others. Orgs allows you to deploy and organize new accounts into a logical structure (Organisational Units or OUs) that works for your business so that you can control access to different AWS services as well as enhance security across all your accounts. You can learn more about AWS Organizations here.

Administration & Orchestration

This is your management tier for AWS accounts. As an organization, it is up to you to decide when a service, function, or responsibility requires a new account. You can have as few or as many accounts as you wish. At the OU level, you can then control which services should(n’t) be used within those accounts.

For example, one account could host all your enterprise Active Directory, Exchange, Roaming Profile configuration, and security policy. Having a separate account for these solutions will allow you to provide more granular access to the administrators of these services and at the OU level, you can prevent your Windows Systems Admins from accessing anything related to AWS IoT services for example. Because why would they need that?

Other accounts here could be for aggregating all security logs from all other accounts for archive purposes as well as the integrity of the logs. In the same account or a separate account, you could give your security team access to run and deploy all the tools they need to keep your network infrastructure and operating systems safe and secure.

Shared Services

These are the account(s) that will manage any AWS services that you want to share across the rest of your environment. One major change in 2018 was the ability to share AWS VPCs (subnets) across accounts. This feature allowed the pooling of IP Space that can then be delegated to an entire environment without needing to be explicitly assigned upfront, and therefore difficult to move around and reuse for future projects.

Managing your organization’s DNS at a central point will ensure developers and other system admins can get the benefit of using the cloud but are not making changes to critical services without going through the proper channels and processes.

Another possible option is a consolidated storage account for resources that are statically shared across your organization or divisions within it. The storage account can have all copies of the software, user data, etc. that is needed and within that account, replication can be configured for disaster recovery and all storage billing can be accounted for within a single account.

Scale indefinitely

Once your configuration of Administration and Shared Services accounts is complete, you’re done laying the foundation. Now you can have a virtually limitless number of sub-accounts that can be used for different projects, divisions, or applications within your organization. These accounts can be deployed or tore down at will and linked within the AWS architecture where required to inherit the appropriate service controls and shared services desired.

You now have the ability to layout the architecture at this point however you’d like. We’ll further discuss some of the potential configurations in a later post. Just keep all these details in mind as you go forward.

Pros & Cons

Regardless of what you feel when looking at that very busy picture, there are pros and cons to each situation which I will briefly address here:

Just starting

If you’re not in the cloud yet, lucky you. You have a slight edge over any organization that already has workloads off-site. AWS Organizations is somewhat more ‘organizable’ from a blank slate. You get to learn everything from scratch and get this right from the beginning (also a con).

PROS
- minimize rework of permissions configuration
- aggregate and archive logging appropriately
- store and organize resources properly
CONS
- slow-moving
- steep learning curve
- analysis paralysis – will you ever land on a design?

Already cooking

Rather you’ve been in the cloud for one year or ten, you are going to face some challenges as you begin to examine what a ‘true’ multi-account configuration will look like. There are more positives than negatives. It’s just that your negatives could have some serious impact if not accounted for in transition.

PROS
- you may already know the ideal design based on lessons learned to-date
- you can address your backlog of rework in this transition
- leverage newly available services not easily refactored into your existing configuration
CONS
- permissions and Service Control Policies (SCPs) inherited from your new Orgs configuration
- may need to consider new billing process
- won’t notice any benefit from the labor until much later in the process

Still worth it?

Some of these pros and cons will be better addressed in future posts of this series. Additional considerations regarding the logical structuring of your organizational configuration will also be shared as we go forward.

Either situation you are in, the reality is, in the long term a multi-account configuration will set you up better for success. You will have less to worry about with tagging of resources, making sure your developers aren’t messing up already approved configurations, and proper billing/forecasting of your production costs over your R&D costs.

In a well-thought-out multi-account architecture, what you know you need running can be placed in a safe space while anything that isn’t business-critical becomes ephemeral and you can cut-bait at any time to start fresh.

Series parts are as follows:

My experience with Multi-Account Architecture

February 10, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 2 of 9 in the series “Multi-account Architecture”

This is post 2 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

Background

The context provided in the previous post is not hypothetical. It is the reality I faced within about three months of starting enterprise cloud work for a new organization. Prior to this transition in mid-2016, I had some experience with cloud. Since late 2011 I had been exploring Microsoft Azure App Services, Databases, Storage Accounts, and virtual machines. But nothing at enterprise scale. From 2012 to 2014 I had deployed some web and mobile applications using the Azure App Services Stack, built up a very small user base, and began exploring Amazon’s suite of cloud services to do the same thing. This was more for curiosity sake than it was for job requirements. At the time I was being paid to be a Windows Systems Administrator. Anything I did in the cloud was for my own career advancement and self-interest.

In early 2016 I was ‘presented’ an opportunity to explore using AWS S3, Storage Gateway, and Virtual Tape Library for enterprise cloud backup. I say ‘presented’ because I really sought the opportunity myself regardless of who wants to take credit. Granted, had I not been afforded the opportunity, my rapid transition into cloud architecture would not have happened at the pace it did. I would have still gotten here, just maybe not as quickly and with the knowledge I plan to share throughout the rest of this series.

After exploring and deploying enterprise cloud backup for a bit, I was pulled completely into the cloud program. That is where shortly after the team was broken into two. An operations team and a design team. I was asked to lead the operations team. Not because of my experience in the cloud yet but more so because I was the most ‘senior level’ as well as most outspoken or opinionated.

At this time the discussion was to create as few AWS accounts as possible. These accounts were to provide managed services to allow the enterprise to leverage the cloud in a very scalable, yet manageable fashion. This was due to manpower experience as well as the ability to pay for more people. How could we manage even a half-dozen accounts with so few support personnel?

The current configuration

Before I go any further with the transition to as few accounts as possible to literally hundreds, I need to briefly explain the current architecture configuration.

The goal was to give individual development teams the freedom to create cool and innovative things in the cloud. All the while segmenting cost and not bumping into other development teams working completely different and unrelated projects. There was a primary account for managing Cloud Active Directory, Web Application Firewall, Ingress/Egress, and a few other enterprise security tools. The rest of the architecture was dedicated to letting anyone with a new application or idea to leverage the cloud. The team that I was a part of was charged with keeping strict baselines, securing all the accounts, and supporting the developers by enabling ‘approved’ AWS services. The organization’s initiative as a whole became ‘cloud-first.’ Anything new, any request for new hardware first had to answer the question – Why not go cloud?

‘Cloud’ and “Cloud-First’ meant something different to nearly everyone but when I arrived on the program, there was about a dozen AWS accounts that were all pretty much managed to the best of anyone’s ability at this stage in the game. Lots of drift and nuance from account to account. Nothing was standardized even though team members were under the impression it was. Every ‘production’ application account at the time needed to first have an existing ‘development’ account already deployed and an application accredited to be ‘production’ prior to getting an account to make the application live. In a very short time, any consideration of deploying another AWS account was of high concern. Keeping all the accounts in line and near baseline was virtually impossible.

Furthermore, as we explored how to better manage and scale through the use of shared or managed services, we realized there was no great way to provide the level of service required while being able to break out usage and charges in a productive manner. The ideal that was first 80% managed service accounts and 20% custom developer accounts soon shifted to the inverse. When the team laid out to management the facts and caveats to managing fewer accounts versus more, the reality became clear. The paradigm shifted. We needed to support as many AWS accounts as possible. This is when I was asked, “what do we need to do to support hundreds of AWS accounts while maintaining minimal support staff (a.k.a. my team)?”

Supporting the masses

At this time we still had just over a dozen AWS accounts and beginning to understand the transition to managing many more. Deploying a new AWS account, ‘baselining it’, and providing access to developers took a minimum of two weeks. Since development teams had access limited only to the services we allowed and we prevented any changes to networking configuration for security reasons, changes to any account at the time took anywhere from 1 to 3 weeks. This was due to the fact that we first had to call a CAB, discuss the change, and the responsible parties had to review and implications of the change before manually making that change in the respective accounts.

It’s early 2017 and because of the environment we were in, some of the great services AWS released were not available to us quite yet. Take AWS Organizations for example. A vast majority of the automation we had in place was a hodgepodge of scripts and Jenkins jobs that in some cases cause more issues than they solved. It was a mess. But again, everything was done in good faith and using the best tools and know-how we had at our disposal at the time.

A new mindset

After nearly a year of limping along, supporting a slow growth of AWS accounts, discussing our issues, prioritizing (re)work of processes, and (re)factoring of automation, we decided to take a 3-month break from deploying new accounts to build out our new systems and plan for the future. Throughout the year I researched other organizations, AWS Talks, and Webinars, spoke with team members about what was working and what wasn’t, and I re-evaluated our change management process. I ultimately prioritized with the team what we believed needed to be done in order to support a seemingly infinite amount of AWS accounts with basically a team of no more than 8 support personnel.

During our break from deployment, we built new systems and automation as well as leveraged some AWS services that had recently become available to us. As we reopened the doors and began to deploy new accounts, we quickly realized our new processes had introduced a new problem. We were now able to scale very quickly. Support was still going to be an issue at some point but now we were having Private IP Space issues and running out quickly. Around the same time, we hit a wall with IP addressing, we also hit an undocumented limitation of AWS Transit VPC. Any newly advertised routes into our cloud environment would bring down the network. All the work we had done to deploy cloud accounts quickly and stick them to a baseline was now stuck in its tracks. We solved ourselves into a new problem, even if it was a good problem to have.

AWS to the rescue

It’s now the end of 2018 and time for re:Invent again. Regardless of all the team was able to accomplish as a result of limping along in 2017, the release of AWS Organizations, and the implementation of our new processes and automation resulting from our strategic pause at the beginning of 2018, we were stuck. We manufactured a new problem. Until AWS released Shared VPCs and RAM (or Resource Access Manager).

This opened up many opportunities for the program to meet the needs of any developers looking to work on new projects in the cloud. This new feature released by AWS enabled the team that I was a part of to begin scaling to the number of AWS accounts we were asked to support just 18 months prior.

Sooo… Mission accomplished?

Not quite. Shortly after our implementation of AWS Shared Resources and another refactor of our account deployment process, we found ourselves needing to address many other growing pains. Things like the retirement or upgrade of existing services to cloud-native services as well as the implementation of newly approved services, after evaluating if the meant many different use-case requirements.

Up to this point, the growth trajectory of the program has been insane. So much so that I personally believe organizations of any size would find it impressive. Here are some quick numbers to demonstrate the history of the program.

Beginning of 2017
- Active Accounts: Approximately 12
- Deployment Time: Greater than 2 weeks
- Permission Roles/Policies: 1
- Change Review Time: 2-3 weeks
Beginning of 2018
- Active Accounts: Approximately 30
- Deployment Time: Less than 1 week
- Permission Roles/Policies: About 20
- Change Review Time: 1-2 weeks
Beginning of 2019
- Active Accounts: Greater than 65
- Deployment Time: 30 minutes
- Permission Roles/Policies: 3
- Change Review Time: Virtually in real-time

While I was ‘involved’ with the amazing success of this growth and progress, the team was instrumental in the implementation. I did very little from a technical implementation standpoint. I merely did what I could as a lead to keep things on track, focused, and prevent perceived ‘fires’ from derailing our progress.

In August of 2019, I left this program to pursue other opportunities but I didn’t take leaving lightly. I learned a ton from being part of this team and attribute the following posts in this series to the experience I gained working with the team and that organization.

The following posts in this series do not share any ‘secret sauce’. There isn’t any. The concepts used are well documented in the AWS Multi-Account Security Strategy and the best practices are part of the AWS Well-Architected Framework. Presented here is a unique twist on the lessons learned and the hurdles jumped to be successful in implementing a secure, baselined, and well-documented multi-account cloud architecture.

Series parts are as follows:

Multiple Cloud Account Architecture (Series)

February 3, 2020/0 Comments/in Architecture Multi-account Architecture/by cloudman444

This is post 1 of 9 in the series “Multi-account Architecture”

This is post 1 of 9 in a multi-part series (hosted here) discussing the advantages, pitfalls, deployment methodology, and management of a multi-cloud account architecture. For this series, we are focusing strictly on using AWS as the Cloud Service Provider (CSP) but the concepts discussed port well to any provider or even to on-premise operations.

First, some context

Imagine you have just joined an organization, are a few months into getting to know the project, but still not totally comfortable with the day-to-day ops. Also, imagine that this is really your first foray into a large-scale enterprise cloud operation. You’ve had cloud experience in the past but nothing like what you are about to be asked to do.

Because you are a part of a pretty small program already, you are charged with the operation and support of the enterprise cloud. You are now the ‘OPS Team Lead’ (a team of 3) and are responsible for supporting roughly a dozen AWS accounts and two-dozen developers building applications across those accounts. Now you’ve been asked to figure out how to scale to ‘literally hundreds’ of accounts and support them, with a ridiculously in-proportionate amount of team members.

Mind you it’s early 2017. Cloud is not new by any means but ‘AWS Organizations’ was literally just released. True multi-account strategy has just now started being discussed and considered at AWS’s premier annual conference, re:Invent, a couple of months prior. To date, it was possible to have multiple AWS accounts, and managing them wasn’t a total nightmare but did take some creative processes.

From this point on, when I speak of operating, managing, and supporting multiple AWS accounts, we mean:

Managing networking and resources – VPCs, Subnets, Routes, Security Groups, ACLs, etc.
Controlling the AWS services (and their actions) that CAN and SHOULD be used in the accounts
Manage S3 bucket policies and ACLs
Configure multi-account and cross-account access (Dev-to-Prod, Prod-to-Dev, Orchestration to Dev/Prod, etc)
General end-user support, troubleshooting, and configuration

This is important because a lot of what you read going forward may (or not) cause you to scratch your head. Some of the problems solved, using these processes, aren’t problems at all. They are manufactured by the very process of controlling multiple accounts. That is an important distinction – ‘controlling.’ Depending on the organization, the information present, and the intellectual property involved, your organization may opt to lock down everything in your enterprise cloud. Controlling everything previously listed doesn’t 100% prevent security issues or breaches of data, but it significantly decreases the potential of inadvertent critical mistakes that could put your business on the front page of a tech blog, or worse yet, in front of Congress.

Hopefully, that sets the stage for the following posts in this series.

What this series covers

Now that we’ve laid the foundation for why any of what we are going to be going over matters, let’s briefly discuss the benefits of taking on all the additional work of configuring a multiple-account structure. Even if you are a very small organization, this application has significant benefits. Yes, it feels like a lot of work upfront but we promise it is worth it in the long run. We’ll even cover some processes and systems to make managing all of these a little less burdensome.

Benefits of multi-account architecture

At the time of writing this, It is now time for re:Invent 2019 so if you’ve been in the cloud game for a few years, none of these are new but first let’s list some of the benefits:

Cost containment
Enhanced security posture
Eliminate rouge resources
More manageable and controlled change
Isolation of critical services
Maintainability of a baseline configuration
Implementation of least privilege
Better documented (more easily understood) architecture

Pitfalls and complications of multi-account architecture

Benefits never come without costs. That should be obvious in enterprise I.T. by now. One of the costs might actually be… costs. The initial work and effort to put into refactoring your architecture will cost time in human capital. If you are fortunate to just now be considering the cloud, you can sneak past this added cost. You will, however, pay the Cloud Service Provider (CSP) for using some services in more than one location. That is inevitable. What you have to realize that there will be economies of scale. You will recoupe that cost operationally later. You just might not be able to see it now.

There are other obvious costs:

the added complexity of initial configuration
potentially a steep learning curve
the perspective of virtually no noticeable progress near-term
rework

Moving On

There is so much more to all of this than these few bullets. We make no promises that we can convince you this is how you should spend your next 6 months to 3 years in the cloud. However, in the next parts of this series, we will attempt to make a compelling argument as well as share some complimenting resources to help you make this transition (or advance your current position) much more palatable.

Series parts are as follows: