modern distributed systems design – security and high availability 1.measuring availability...

21
Modern Distributed Systems Design – Security and High Availability 1. Measuring Availability 2. Highly Available Data Management 3. Redundant System Design

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Modern Distributed Systems Design

– Security and High Availability

1. Measuring Availability

2. Highly Available Data Management

3. Redundant System Design

Page 2: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Measuring Availability

• How resiliency and high availability are interconnected?

• Define downtime and what causing downtime.

• How to meager availability?

Page 3: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Measuring Availability

Percentage Uptime Percentage Downtime

Downtime per year Downtime per week

98% 2% 7.3 days 3h22m 99% 1% 3.65 days 1h41m

99.8% 0.2% 17h30m 20m10s 99.9% 0.1% 8h45m 10m5s

99.99% 0.01% 52.5m 1m 99.999% 0.001% 5.25m 6s

99.9999% 0.00001% 31.5s 0.6s

Page 4: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Define Downtime

• Downtime could be defined by following: “If a user cannot get his job done on time, the system is down”

Page 5: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

What causing downtime?

• Planned – ones that easiest to reduce that include scheduled system maintenance, hot-swappable hard drives, cluster upgrades and even failovers. Usually 30% of all downtime;

• People or human factor – dumb mistakes and complex innovation in IT equipment, software and protocols requires greater knowledge of engineers. Usually 15 % of all downtime;

• Software Failures - due to software bugs and viruses. (40%)

Page 6: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

How to meager availability?

MTBF

Availability = ---------------------, where

MTBF + MTTR

MTBF – “mean time between failures” and MTTR - “maximum time to repair”

Page 7: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

What can go wrong?

• Hardware

• Environmental and Physical Failures

• Network Failures

• Database System Failures

• Web Server Failures

• File and Print Server Failures

Page 8: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

The Cost of Downtime.

Industry Business Operation Average Downtime cost per hour

Financial Brokerage Operation $6.45 Mil Financial Credit Card/Sales

Authorization $2.6M

Media Pay per view TV $150K Retail Catalog sales $90K-$115K

Transportation Airlines $89.5K

Page 9: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Levels of Availability:

1.Regular Availability

2.Increased Availability

3.High Availability

4.Disaster recovery

5.Fault-Tolerant System

Page 10: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Highly Available Data Management

• Data management is the most sensitive area of modern distributed systems.

• Quick overview of existing data topologies

Page 11: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Redundant System Design

• Redundant storage (RAID, Multi-hosting, Multi-Pathing, DiskArray, JBOD, etc)

• Failover Configurations and Management

• Introduction to SAN and Fibre Channel protocol

• Security aspects of data management in Storage Area Networks

Page 12: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Redundant storage

Page 13: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Redundant Storage (RAID 5)

Page 14: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Failover Configurations and Management

Failover must meet following requirements:

• Transparent to client;

• Quick (no more then 5 min, ideally 0-2 min);

• Minimal manual intervention, guaranteed data access.

Page 15: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Failover components:

• Two servers, one primary another takeover;

• Two network connections, third is highly recommended

• All disks on a failover pair should have some sort of redundancy

• Application portability

• No single point of failure.

Page 16: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Symmetric Failover

Page 17: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Asymmetric Failover

Page 18: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Fibre Channel, SAN, IP Storage

Page 19: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Security in IP Storage Networks

• Security in Fibre Channel SANs

• Security Options for IP Storage Networks

Page 20: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Fibre Channel SAN Security

• Port or hard zoning

• WWN Zoning

• LUN Masking

Page 21: Modern Distributed Systems Design – Security and High Availability 1.Measuring Availability 2.Highly Available Data Management 3.Redundant System Design

Security Options for IP Storage Networks

• iSNS

• LUN Masking as in Fibre Channel and VLAN tagging

• IP Security or IPSec

• ACL