Using Distributed Diagnosis to Deploy
Highly-Available Web Servers

Roverli P. Ziwich, Egon Hilgenstieler, Emerson F. F. Carara, Elias P. Duarte Jr., Luis C. E. Bona
The 6th IEEE Latin American Web Congress (LAWeb'2008),
pp. 129-134, Vila Velha, ES, Brazil, Oct, 2008.  [pdf]  Digital Object Identifier  IEEE Xplore Digital Library


This work presents SAPOTI (Dependable TCP/IP Application Servers - in Portuguese: Servidores de
APlicações cOnfiáveis Tcp/Ip
), a distributed tool that guarantees the availability of TCP/IP application servers,
in particular of Web servers. The tool is based on the SNMP framework and is executed on a group of Web servers running on a set of hosts that are monitored by a dependable distributed network management tool based on the hierarchical diagnosis algorithm Hi-ADSD with Timestamps. One server is elected to be responsible for the service. After this server becomes faulty and this event is diagnosed, the service is automatically recovered by electing another server among those that are fault-free. A priority scheme based on identifiers is defined. The service is available even if only one host/server is fault-free. Experiments are described obtained from a implementation of SAPOTI on a LAN with real Apache servers. Experiments that involved the injection of 210 faults distributed among a group of six servers were run, the measured availability was at least 97.3%. In another experiment with five servers, where 27 faults were injected, the availability was 99.5% during the whole experiment time.


[1]   D. E. Comer, Internetworking with TCP/IP – Vol. I Principles, Protocols, and Architectures, Prentice Hall, 5th ed., 2005.

[2]   E. P. Duarte Jr., and L. C. E. Bona, “A Dependable SNMP-Based Tool for Distributed Network Management,” IEEE/IFIP International Conference on Dependable Systems and Networks, 2002.

[3]   Luis C. E. Bona, Elias P. Duarte Jr., “A Flexible Approach for Defining Distributed Dependable Tests in SNMP-Based Network Management Systems,” Journal of Electronic Testing Theory and Applications, Vol. 20, No. 4, 2004.

[4]   E. P. Duarte Jr., A. Brawerman, and L. C. P. Albini, “An Algorithm for Distributed Hierarquical Diagnosis of Dynamic Fault and Repair Events,” Proceedings of the IEEE International Conference on Parallel and Distributed Systems, 2000.

[5]   G. Masson, D. Blough, and G. Sullivan, System Diagnosis in Fault-Tolerant Computer System Design, ed. D. K. Pradhan, Prentice-Hall, 1996.

[6]   P. Jalote, Fault Tolerance in Distributed Systems, Prentice Hall, 1994.

[7]   The NET-SNMP Project Home Page, Acessed in June 2008.

[8]   The Linux Home Page at Linux on Line, Acessed in June 2008.

[9]   The Apache Software Foundation, http://www. Acessed in June 2008.

[10] IP-Alias, HOWTO/mini/IP-Alias. Acessed in June 2008.

[11] RSYNC, Accessed in June 2008.

[12] B. Callaghan, NFS Illustrated, Addison-Wesley, Jan, 2000.

[13] PHP Hypertext Preprocessor, Accessed in June 2008.

[14] Debian GNU/Linux, The Universal Operating System, Accessed in June 2008.

[15] High-Availability Middleware on Linux, Part 1: Heart-beat and Apache Web server, developerworks/library/l-halinux. Acessed in June 2008.

[16] D. M. Dias, W. Kish, R. Mukherjee, and R. Tewari, “A Scalable and Highly Available Web Server,” Proceedings of the 41st IEEE International Computer Conference, 1996.

[17] V. Cardellini, M. Colajanni, and P. S. Yu, “DNS Dispatching Algorithms with State Estimators for Scalable Web-Server Clusters,” World Wide Web, Vol 2, No. 3, 1999.

[18] X. Gan, T. Schroeder, S. Goddard, and B. Ramamurthy, “LSMAC vs. LSNAT: Scalable Cluster-Based Web Servers,” Cluster Computing, Vol. 3, No. 3, 2000.

[19] Q. Zhang, A. Riska, W. Sun, E. Smirni, and G. Ciardo, “Workload-Aware Load Balancing for Clustered Web Servers,” IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 3, 2005.

[20] A. Riska, W. Sun E. Smirni, and G. Ciardo, “ADAPTLOAD: Effective Balancing in Clustered Web Servers Under Transient Load Conditions,” Proceedings 22nd International Conference on Distributed Computing Systems, 2002.

[21] G. Teodoro, T. Tavares, B. Coutinho, W. Meira Jr., and D. Guedes, “Load Balancing on Stateful Clustered Web Servers,” Proceedings 15th Symposium on Computer Architecture and High Performance Computing, 2003.

[22] V.Cardellini, M. Colajanni, and P.S. Yu, “Redirection Algorithms for Load Sharing in Distributed Web-server Systems,” Proceedings. 19th IEEE International Conference on Distributed Computing Systems, 1999.

[23] S. Sharifian, M. K. Akbari, and S. A. Motamedi, “A Novel Intelligence Request Dispatcher Algorithm for Web Server Clusters,” IEEE-Eurasip Nonlinear Signal and Image Processing, 2005.

[24] C.-S. Yang, M.-Y. Luo, “Building an Adaptable, Fault Tolerant, and Highly Manageable Web Server on Clusters of Non-Dedicated Workstations,” Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing, 2000.

[25] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: High Availability via Asynchronous Virtual Machine Replication,” Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 161-174, 2008.

[26] Y.-F. Sit, C.-L. Wang, and F. lau, “Socket Cloning for Cluster-Based Web Servers,” Proceedings of the IEEE International Conference on Cluster Computing, 2002.

[27] Y.-F. Sit, C.-L. Wang, and F. lau, “Cyclone: A High-Performance Cluster-Based Web Server with Socket Cloning,” Cluster Computing, Vol 7, No. 1, 2004.