Intrusion Tolerance

Objectives

  • Explain the relevance of distribution in the reliability and security of computer systems.
  • Characterize reliability and security challenges in terms of abstract problems and models.
  • Discuss the role of distributed agreement algorithms (e.g. consensus) in solving reliability and security problems.
  • Implement replicated systems for fault and intrusion tolerance.
  • Evaluate distributed systems that solve problems of scale and reliability.

Program

  • Fundamentals of reliability and cyber-resilience: threats and models of distributed computing and failures.
  • Coordination of distributed computing in the presence of malicious components: consensus algorithms and total order.
  • Redundancy of critical components in distributed systems: Byzantine fault tolerance.
  • Intrusion tolerance case studies: replicated database management systems, blockchain technology.

Bibliography

  • C. Cachin, R. Guerraoui, L. Rodrigues. Introduction to Reliable and Secure Distributed Programming. Springer, 2011.
  • M. Raynal. Fault-Tolerant Message-Passing Distributed Systems: An Algorithmic Approach. Springer, 2018.
  • B. Charron-Bost, F. Pedone, A. Schiper (Eds). Replication: Theory and practice. Springer, 2010.
  • P. Veríssimo, L. Rodrigues. Distributed Systems for System Architects. Kluwer Academic, 2001.
  • K. Birman. Reliable Distributed Systems. Springer, 2012.
  • R. Anderson. Security engineering: a guide to building dependable distributed systems. 3rd edition, John Wiley & Sons, 2020.
  • R. Ross et al. Developing cyber-resilient systems: A systems security engineering approach. Technical Report Special Publication (SP) 800-160, Rev. 1, NIST, 2021.

Updated: