Explain the relevance of distribution in the reliability and security of computer systems.
Characterize reliability and security challenges in terms of abstract problems and models.
Discuss the role of distributed agreement algorithms (e.g. consensus) in solving reliability and security problems.
Implement replicated systems for fault and intrusion tolerance.
Evaluate distributed systems that solve problems of scale and reliability.
Program
Fundamentals of reliability and cyber-resilience: threats and models of distributed computing and failures.
Coordination of distributed computing in the presence of malicious components: consensus algorithms and total order.
Redundancy of critical components in distributed systems: Byzantine fault tolerance.
Intrusion tolerance case studies: replicated database management systems, blockchain technology.
Bibliography
C. Cachin, R. Guerraoui, L. Rodrigues. Introduction to Reliable and Secure Distributed Programming. Springer, 2011.
M. Raynal. Fault-Tolerant Message-Passing Distributed Systems: An Algorithmic Approach. Springer, 2018.
B. Charron-Bost, F. Pedone, A. Schiper (Eds). Replication: Theory and practice. Springer, 2010.
P. Veríssimo, L. Rodrigues. Distributed Systems for System Architects. Kluwer Academic, 2001.
K. Birman. Reliable Distributed Systems. Springer, 2012.
R. Anderson. Security engineering: a guide to building dependable distributed systems. 3rd edition, John Wiley & Sons, 2020.
R. Ross et al. Developing cyber-resilient systems: A systems security engineering approach. Technical Report Special Publication (SP) 800-160, Rev. 1, NIST, 2021.