Module 16

Disaster Recovery

Failures can be small scale (server stops), large scale (many resources across AZ's stop) or global (Aliens).

Determining RPO

RPO is the maximum acceptable amount of data loss, measured in time

The last backup was made at 8pm. The disaster happens 8 hours later. We have lost 8 hours of data

Determining RTO (Recovery time objective)

RTO is the maximum acceptable amount of time after a disaster strikes that a business process can remain out of commission

The last backup was made at 8pm. The disaster happened 8 hours later. We only have 1 hour to recover the application

Business continuity plan

Is a system of prevention and recovery from potential threats to a company and consists of the following:

  • Business impact analysis

  • Risk assessment

  • Disaster recovery plan

  • Evaluated and determined RPO and RTO

Disaster recovery plans should span more than one region

If all of our data is in the cloud we can configure cross-region replication

With EBS Volumes

Storage backup from on-prem

Made for resiliency and recovery

Replicating and redeploying envrionement

CloudFormation
OpsWords

Use templates to quickly deploy collections of resources as needed

Manage and deploy applications across fleets.

Duplicate production environments in a new Region or virtual private cloud (VPC) in minutes.

Types of DR Plans

Backup and restore pattern

Backup and restore is a suitable approach for mitigating against data loss or corruption. it can take a long time to restore your system when a disaster occurs. Cheapest

Storage Gateway backup and restore pattern

Hybrid storage service that enables your on-premises applications to use AWS Cloud storage

Pilot light pattern

A minimal backup version of your environment is always running. The pilot light analogy comes from a gas heater: a small flame (or the pilot light) is always on, even when the heater is off. The pilot light can quickly ignite the entire furnace to heat a house

The secondary cannot handle the entire load of the primary and will need to scale quickly to handle traffic

Warm Standby pattern

The warm standby pattern is like the pilot light, but more resources are already running. The warm standby solution extends the pilot light elements and preparation. It further decreases the recovery time because some services are always running.

These servers can be running on a minimum-sized fleet of EC2 instances with the smallest sizes possible.

Multi-site pattern

You have a fully functional system that runs in a second Region. It runs at the same time as the on-premises systems or the systems that run in a different Region.

Last updated

Was this helpful?