Master 2 Recherche Systèmes et
Logiciel, 2006-2007
Bibliographie pour le cours
``Algorithmique et techniques de base des systèmes
répartis''
S. Krakowiak
La page web du cours est http://sardes.inrialpes.fr/people/krakowia/Enseignement/M2R-SL/SR
Cette page sera remise à jour
au cours de l'année
N.B.
Deux astérisques (**) : si vous ne
lisez qu'un article par thème, c'est celui-là
Un astérisque (*) :
références principales pour le thème
Références
générales
[Guerraoui & Rodrigues 06]
R. Guerraoui, L. Rodrigues. Reliable Distributed Programming, Springer, 2006
[Tanenbaum & van Steen 02]
A. S. Tanenbaum, M. van Steen, Distributed
Systems - Principles & Paradigms, Prentice Hall, 2002
[Mullender 93]
S. Mullender (ed.), Distributed Systems (2nd
edition), Addison-Wesley, 1993
[Coulouris et al. 03]
G. Coulouris, J. Dollimore, and T. Kindberg, Distributed
Systems, Concepts and Design, 3rd. ed. Addison-Wesley, 2003
[Singhal & Shivaratri 94]
M. Singhal, N. G. Shivaratri, Advanced
Concepts in Operating Systems,
McGraw-Hill, 1994
Datation, causalité, horloges
logiques, horloges vectorielles
Quelques références en ligne :
- ordre causal
- horloges logiques
*[Lamport 78]
L. Lamport, Time,
clocks, and the ordering of events in a distributed system, Communications
of the ACM, 21, 7, july 1978, pp. 125-133
*[Mattern 88, 92]
F. Mattern, Virtual time and global states
in distributed systems, International Conference on Parallel and
Distributed Algorithms, North Holland (1988), pp. 215-226 ; voir
version plus récente et plus complète : On the
Relativistic Structure of Logical Time in Distributed Systems, in Datation
et Contrôle des Exécutions Réparties, Bigre,
nr. 78, édité par l'IRISA, mars 1992
**[Babaoglu 93]
Ö. Babaoglu, K. Marzullo, Consistent
Global States of Distributed Systems: Fundamental Concepts and
Mechanisms, in [Mullender 93]
[Raynal 91b]
M. Raynal, A. Schiper, S. Toueg, The causal
ordering abstraction and a simple way to implement it, Information
Processing Letters, 39, 6, Sept. 1991, pp. 343-350
[Raynal 92]
M. Raynal, About Logical Clocks for
Distributed Systems, ACM Operating Systems Review, 26, 1,
Jan. 1992, pp. 41-48
État global, reprise
Quelques références en ligne :
état global
*[Chandy 85]
K.M. Chandy, L. Lamport, Distributed
snapshots: determining global states of distributed systems, ACM
Transactions on Computer Systems, 3, 1, Feb. 1985, pp. 63-75
[Koo 87]
R. Koo, S. Toueg, Checkpointing and
rollback-recovery in distributed systems, IEEE Trans. on Software
Engineering, SE-13, 1, jan. 1987, pp. 23-31
**Recovery, chap. 12 of [Singhal 94]
[Johnson 95]
D.B. Johnson S.W. Smith and J.D. Tygar, "Completely
asynchronous optimistic recovery with minimal rollbacks," In Proc.
of the 25th Int'l Symp. on Fault Tolerant Computing Systems, pp.
361-370, Jun. 1995.
*[Elnozahy et al. 96]
M. Elnozahy, L. Alvisi,
Y.-M. Wang, and D. B. Johnson. A Survey of
Rollback-Recovery Protocols in Message-Passing Systems, Technical
Report CMU-CS-96-181, Carnegie Mellon University, October 1996
Algorithmes répartis
*[Chang
79]
E.J.
Chang, R. Roberts, An improved algorithm for decentralized
extrema-finding in circular configurations of processors, Communications
of the ACM, 22, 5, May 1979, pp. 281-283
*[Dijkstra
80]
E.W.
Dijkstra, C. S. Scholten, Termination detection for diffusing
computations, Information Processing Letters, 11, 1, Aug. 1980,
pp. 1-4
*[Misra
83]
J.
Misra, Detecting termination of distributed computations using markers, Proc.
2nd ACM Symposium on Principles of Distributed Computing, Montreal,
Canada, Aug. 1983, pp. 290-294
*[Ricart
81]
G.
Ricart, A.K. Agrawala, An
optimal algorithm for mutual exclusion in computer networks, Communications
of the ACM, 24, 1, Jan. 1981, pp. 9-17
**Distributed
Mutual Exclusion, chap. 6 of [Singhal 94]
Un site (M.
Ben Ari) présentant des animations en ligne de divers algorithmes
répartis (dont Ricart-Agrawala, Chandy-Lamport,
Dijkstra-Scholten, Généraux byzantins).
Tolérance aux fautes et problèmes
connexes
Serveurs fiables, duplication
**[Guerraoui
97]
R.
Guerraoui, A. Schiper. Software-based Replication for Fault-Tolerant
Systems, IEEE Computer, vol. 30, 4, April 1997, pp. 68-74 - version
étendue
Validation atomique
**[Babaoglu
93]
Ö.
Babaoglu, S. Toueg, Non-blocking Atomic Commitment, in [Mullender 93]
[Bernstein et al 87]
Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman: Concurrency Control and Recovery in
Database Systems. (Chapter 7) Addison-Wesley 1987. Voir version
en ligne
[Guerraoui
95]
R.
Guerraoui, A. Schiper. The
decentralized non-blocking atomic commitment protocol,
IEEE Int. Symposium on Parallel and Distributed Processing, 1995.
*[Guerraoui
98]
R.
Guerraoui, M. Raynal, A. Schiper. Validation atomique et consensus :
une approche synthétique. Technique et Science Informatiques,
vol. 17, 3, 1998, pp. 279-298.
Diffusion fiable, groupes
[Birman
87]
K.P.
Birman, T.A. Joseph, Reliable communication in the presence of
failures, ACM Transactions on Computer Systems, 5,1, Feb. 1987,
pp. 47-7
*[Chokler 01]
Group
Communication Specification: A Comprehensive Study, Gregory Chockler,
Idit Keidar and Roman Vitenberg, In ACM Computing
Surveys 33(4), pages 1-43,
December 2001. Point récent sur les spécifications,
illustre la difficulté d'une spécification précise
et complète
[Cristian
85]
F.
Cristian, H. Aghili, R. Strong, D. Dolev, Atomic broadcast: from simple
message diffusion to byzantine agreement, Proc. 15th Conf. on
Fault-Tolerant Computing Systems (FTCS-15), Ann Arbor (1985)
[article historique]
**[Hadzilacos
93]
V.
Hadzilacos , S. Toueg, Fault-Tolerant Broadcast and Related Problems,
in [Mullender 93]
version
étendue sous le titre : "A
modular approach to the specification and implementation of
fault-tolerant broadcasts''. Technical Report TR94-1425.
Department of Computer Science, Cornell University, Ithaca NY. May 1994
*[Kaashoek
89]
M.F.
Kaashoek, A.S. Tanenbaum, S. Flynn Hummel, H.E. Bal, An efficient
reliable broadcast protocol, ACM Operating Systems Review, 23,4,
Oct. 1989, pp. 5-19
Consensus
Références générales
**[Turek 92]
J. Turek, D. Shasha, The many faces of
consensus in distributed systems, IEEE Computer, June 1992, pp.
8-17
maintenant un
peu ancien (compléter par Paxos et détecteurs de
pannes)
[Barborak 93]
M. Barborak, M. Malek, A. Dahbura, The
consensus problem in fault-tolerant computing, ACM Computing Surveys,
25, 2, June 1993, pp. 171-220
Agreement Protocols, chap. 8 of
[Singhal 94]
Théorie
[Fisher 85]
M. Fisher, N. Lynch, M. Patterson.
Impossibility of distributed consensus with one faulty processor, Journal
of the ACM, 32, 2 (1985), pp. 373-382. Article historique.
*[Fisher 86]
M. Fisher, N. Lynch, M. Merritt. Easy
impossibility proofs for distributed consensus problems, Distributed
Computing, vol. 1 (1986), pp. 26-39
Aspects pratiques
[Guerraoui 96
R. Guerraoui and A. Schiper. Consensus
Service: A Modular Approach for Building Fault-Tolerant Agreement
Protocols in Distributed Systems, IEEE International Symposium
on Faul-Tolerant Computing Systems (FTCS), Sendai (Japan), June 1996
*[Guerraoui 97]
Rachid Guerraoui and André
Schiper. The
generic consensus service. IEEE Transactions on Software
Engineering, 27(1):29-41, January 2001.
Algorithme de Paxos
*[Lampson 01]
[Lampson 96]
How
to Build a Highly Available System Using Consensus. in
Distributed Algorithms, ed. Babaoglu and Marzullo, Lecture Notes in
Computer Science, Springer, 1996, pp 1-17.
[Lamport 01]
Détecteurs
de pannes non fiables
*[Chandra 96]
Unreliable
failure detectors for reliable distributed systems, Journal of
the ACM, vol. 43, 2, March 1996.
[Chandra 96]
T. D. Chandra, V. Hadzilacos, S.Toueg, The
weakest failure detector for solving consensus, Journal of the
ACM, vol. 43, 4, July 1996, pp. 685-722
Consensus
byzantin
[Lamport 82]
L. Lamport, R. Shostak, M. Pease, The
byzantine generals problem, ACM TOPLAS, 4, july 1982, pp.
382-401 Article historique
*[Brès 91]
G. Brès, Panorama sur les
généraux byzantins, Technique et Science Informatiques,
10, 5, 1991, pp. 341-353.
voir aussi [Lampson 01] The
ABCDs of Paxos, mentionné ci-dessus
Détection et
réparation de pannes
Voir site du projet ROC
Gestion répartie de données
Désignation
*[Needham 93], Names, in [Mullender 93]
**chapitre "Naming" de
[Tanenbaum02]
[Lampson 86]
*[Saltzer 78]J. Saltzer. Naming
and Binding of Objects, in: Operating Systems: an Advanced Course,
LNCS 60, Springer Verlag, 1978, pp. 99-208.
Duplication, cohérence
**voir chapitre 6 de [Tanenbaum & van Steen 02]
Diffusion épidémique
[Demers 87]
A. Demers, et al. Epidemic Algorithms
for Replicated Database Maintenance, Proc. 6th ACM Symp. on Principles
of Distributed Computing, Vancouver, pp. 1-12, 1987
[Van Renesse et al. 03]
R. van Renesse, K. P. Birman, W.
Vogels. Astrolabe: A Robust and Scalable Technology for
Distributed Systems Monitoring, Management and Data Mining. ACM Trans.
on Computer Systems, 21(2), May 2003
*[Eugster 04]
Patrick Eugster, Rachid Guerraoui,
Anne-Marie Kermarrec, and Laurent Massoulié. From Epidemics
to Distributed Computing.
IEEE Computer, 37(5), May 2004.
voir
ici
Systèmes Pair à Pair
(P2P)
[S. Androusellis-Theotokis et al.]
Gnutella : http://www.gnutella.com
Freenet : http://www.freenetproject.org
Tables de hachage réparties
(DHT)
Chord : http://pdos.csail.mit.edu/chord/
Pastry : http://freepastry.org/
Gestion répartie d'objets, études de cas
**[Hagimont 96a]
*[Birrell 95]
Andrew Birrell, Greg Nelson, Susan
Owicki, and Edward Wobber. Network
Objects, Digital SRC Research Report 115, 1995. Version
étendue de : Andrew Birrell, Greg Nelson, Susan Owicki, and
Edward Wobber. Network objects. In Proceedings of Fourteenth Symposium
on Operating Systems Principles (SOSP), pages 217-230, 1993.
[Globe]
[Hagimont 98]
Mémoire virtuelle
répartie
*[Amza 96]
C. Amza, A.L. Cox,
S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel,
TreadMarks:
Shared Memory Computing on Networks of Workstations, IEEE Computer,
Vol. 29, No. 2, pp. 18-28, February 1996. Voir aussi page web du projet
Treadmarks.
[Bershad 93]