ISSN 0104-6500, Vol. 10, No. 3, pp. 44-56, Apr, 2005. [pdf]

This work introduces a new system-level diagnosis model and an algorithm based on this
model: *Hi-Comp* (Hierarchical Comparison-based Adaptive Distributed System-Level
Diagnosis algorithm). This algorithm allows the diagnosis of systems that can be represented by
a complete graph. *Hi-Comp* is the first diagnosis algorithm that is, at the same time,
hierarchical, distributed and comparison-based. The algorithm is not limited to crash fault
diagnosis, because its tests are based on comparisons. To perform a test, a processor sends a
task to two processors of the system that, after executing the task, send their outputs back to the
tester. The tester compares the two outputs; if the comparison produces a match, the tester
considers the tested processors fault-free; on the other hand, if the comparison produces a
mismatch, the tester considers that at least one of the two tested processors is faulty, but can not
determine which one. Considering a system of *N* nodes, it is proved that the algorithm’s
diagnosability is *(N–1)* and the latency is *log _{2}N* testing rounds. Furthermore, a formal proof of
the maximum number of tests required per testing round is presented, which can be

*O(N*. Simulation results are also presented.

^{3})[1] A. Subbiah, and D.M. Blough, “Distributed Diagnosis in Dynamic Fault Environments,” IEEE Transactions on Paralel and Distributed Systems, Vol. 15 No. 5, pp. 453-467, 2004.

[2] G. Masson, D. Blough, and G. Sullivan, “System Diagnosis,” Fault-Tolerant Computer System Design, ed. D.K. Pradhan, Prentice-Hall, 1996.

[3] F. Preparata, G. Metze, and R.T. Chien, “On The Connection Assignment Problem of Diagnosable Systems,” IEEE Transactions on Electronic Computers, Vol. 16, pp. 848-854, 1968.

[4] S.L. Hakimi, and A.T. Amin, “Characterization of Connection Assignments of Diagnosable Systems,” IEEE Transactions on Computers, Vol. 23, pp. 86-88, 1974.

[5] S.L. Hakimi, and K. Nakajima, “On Adaptive System Diagnosis,” IEEE Transactions on Computers, Vol. 33, pp. 234-240, 1984.

[6] S.H. Hosseini, J.G. Kuhl, and S.M. Reddy, “A Diagnosis Algorithm for Distributed Computing Systems with Failure and Repair,” IEEE Transactions on Computers, Vol. 33, pp. 223-233, 1984.

[7] E.P. Duarte Jr., and T. Nanya, “A Hierarchical Adaptive Distributed System-Level Diagnosis Algorithm,” IEEE Transactions on Computers, Vol.47, pp. 34-45, 1998.

[8] R.P. Bianchini, and R. Buskens, “Implementation of On-Line Distributed System-Level Diagnosis Theory,” IEEE Transactions on Computers, Vol. 41, pp. 616-626, 1992.

[9] A. Brawerman, and E.P. Duarte Jr., “A Synchronous Testing Strategy for Hierarchical Adaptive Distributed System-Level Diagnosis,” Journal of Electronic Testing Theory and Applications, Vol. 17, No. 2, pp. 185-195, 2001.

[10] E.P. Duarte Jr., A. Brawerman, and L.C.P. Albini, “An Algorithm for Distributed Hierarchical Diagnosis of Dynamic Fault and Repair Events,” Proc. IEEE ICPADS’00, pp. 299-306, 2000.

[11] S. Lee, and K.G. Shin, “Probabilistic Diagnosis of Multiprocessor Systems,” ACM Computing Surveys, Vol. 26, No. 1, pp. 121-139, 1994.

[12] M. Malek, “A Comparison Connection Assignment for Diagnosis of Multiprocessor Systems,” Proc. Seventh Int’l Symp. Computer Architecture, pp. 31-36, 1980.

[13] K.Y. Chwa, and S.L. Hakimi, “Schemes for Fault-Tolerant Computing: A Comparison of Modularly Redundant and t-Diagnosable Systems,” Information and Control, Vol. 49, pp. 212-238, 1981.

[14] J. Maeng, and M. Malek, “A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems,” Digest 11th Int’l Symp. Fault Tolerant Computing, pp. 173-175, 1981.

[15] A. Sengupta, and A.T. Dahbura, “On Self-Diagnosable Multiprocessor Systems: Diagnosis by Comparison Approach,” IEEE Transactions on Computers, Vol. 41, No. 11, pp. 1386-1396, 1992.

[16] D.M. Blough, and H.W. Brown, “The Broadcast Comparison Model for On-Line Fault Diagnosis in Multicomputer Systems: Theory and Implementation,” IEEE Transactions on Computers, Vol. 48, pp. 470-493, 1999.

[17] D. Wang, “Diagnosability of Hipercubes and Enhanced Hypercubes under the Comparison Diagnosis Model,” IEEE Transactions on Computers, Vol. 48, No. 12, pp. 1369-1374, 1999.

[18] G.S. Almasi, and A. Gottlieb, Highly Parallel Computing, The Benjamim/Commings Publishing Company Inc., 1994.

[19] C. Xavier, and S.S. Iyengar, Introduction to Parallel Algorithms, Wiley-Intersciense Publication, 1998.

[20] N.F. Tzeng, and S. Wei, “Enhanced Hypercubes,” IEEE Transactions on Computers, Vol. 40, No. 3, pp. 284-294, Mar. 1991.

[21] T. Araki, and Y. Shibata, “Diagnosability of Butterfly Networks under the Comparison Approach,” IEICE Trans. Fundamentals, Vol

[22] F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann, San Mateo, CA, 1992.

[23] J. Fan, “Diagnosability of Crossed Cubes,” IEEE Transactions on Computers, Vol. 13, No. 10, pp. 1099-1104, Out. 2002.

[24] F. Harary, Graph Theory, Addison-Wesley Publishing Company, 1971.

[25] S. Rangarajan, A.T. Dahbura, and E.A. Ziegler, “A Distributed System-Level Diagnosis for Arbitrary Network Topologies,” IEEE Transactions on Computers, Vol. 44, No. 2, pp. 312-333, 1995.

[26] M.H. MacDougall, Simulating Computer Systems: Techniques and Tools, The MIT Press, Cambridge, MA, 1987.