High Availability, Fault Tolerance, and Disaster Recovery

A Part of AWS Certified DevOps Engineer – Professional.

Intermediate 4(1 Ratings)
What will you learn?
  • Implement and manage continuous delivery systems and methodologies on AWS
  • Implement and automate security controls, governance processes, and compliance validation
  • Define and deploy monitoring, metrics, and logging systems on AWS
  • Implement systems that are highly available, scalable, and self-healing on the AWS platform
  • Design, manage, and maintain tools to automate operational processes

Live training
  • Date : June 23, 2021
  • Time (GMT) : 14:00:00
  • Time (EST) : 09:00:00
  • Duration : 120 Minutes
  • 2 or more years of experience provisioning, operating, and managing AWS environments.
  • Experience developing code in at least one high-level programming language.
  • Experience building highly automated infrastructures.
  • Experience administering operating systems.
  • Understanding of modern development and operations processes and methodologies.

This Lesson is a part of AWS Certified DevOps Engineer– Professional. In this, you will study about validating your expertise with DevOps by covering a variety of topics such as service availability, operating, fault tolerance, disaster recovery and managing applications on the AWS platform. By the end of this course, you will be prepared to learn the advanced technical skills needed to become a DevOps subject matter expert.


High Availability, Fault Tolerance, and Disaster Recovery

  • Determine appropriate use of multi-AZ versus multi-region architectures.
  • Determine how to implement high availability, scalability, and fault tolerance.
  • Determine the right services based on business needs (e.g., RTO/RPO, cost).
  • Determine how to design and automate disaster recovery strategies.
  • Evaluate a deployment for points of failure.


  • Protection from downtime.
  • Maximum flexibility.
  • Simplify maintenance.
  • Avoid a situation where the functionality of the system becomes unavailable due to a fault.
  • Fault tolerance is necessary for systems that are used to protect people’s safety in systems that security, data protection, data integrity, and high-value transactions all depend on.
  • Preventative measures that reduce the risk of a man-made disaster taking place.
  • Detective measures aimed at identifying unwanted events quickly.
  • Corrective measures that restore lost data and allow for business processes to resume in the aftermath of a disaster.

+ View more
Other related courses
Student feedback
Average rating
  • 0%
  • 0%
  • 0%
  • 100%
  • 0%
Buy now
  • Learn from Industry Experts
  • Ask Question to Trainer
  • 100% Online Courses
  • Certificate of Completion