Diagnosis of Distributed Systems Based on Abnormal Symptom Histories

 

Sunggu Lee: Department of Electronic and Electrical Engineering, Pohang University of Science and Technology (POSTECH), San 31 Hyoja Dong, Pohang 790-784, Korea.

(TEL) +82-54-279-2236 (FAX) +82-54-279-5940 (E-Mail) slee@postech.ac.kr

(URL) http://cal.postech.ac.kr/slee1/lsg1.htm

Seung Gu Kim: same affiliation as above.  (TEL) +82-54-279-5936 (E-Mail) kimsg@postech.ac.kr

 

Abstract

 

In this presentation, the main idea is to use observations of abnormal node/program behaviors and their times of occurrence to diagnose the state of the target system. Just as a variety of simple and complex symptoms, combined with information on their times of occurrence, are carefully analyzed to diagnose a human patientÕs current health, a variety of abnormal node/program behaviors (symptoms) can be monitored and summarized into state messages that can be sent to neighboring nodes. Each node can then use the state messages received from neighboring nodes to maintain a state history for those nodes. This information can be used to diagnose and isolate faulty nodes.

 

Short Biography

 

Sunggu Lee: Sunggu Lee is a Professor in the Department of Electronic and Electrical Engineering at Pohang University of Science and Technology. Prior to this appointment, he was an Assistant Professor in the Department of Electrical Engineering at the University of Delaware in Newark, Delaware, U.S.A.  From June 1997 to June 1998, he spent one year as a Visiting Scientist at the IBM T. J. Watson Research Center.  Sunggu Lee received the B.S.E.E. degree with highest distinction from the University of Kansas, Lawrence, in 1985 and the M.S.E. and Ph.D. degrees from the University of Michigan, Ann Arbor, in 1987 and 1990, respectively.  His current research interests are in mobile ad-hoc networks (routing, time synchronization, real-time communication), cluster and grid computing (middleware, task scheduling), real-time communication, scheduling), and fault-tolerant computing (fault-tolerant communication, checkpoint/restart).

 

Seung Gu Kim: He is a second-year Master-degree program student in the Department of Electronic and Electrical Engineering at POSTECH.  He graduated with a B.S. from Sogang University in 2004.  His research interests are in fault-tolerant computing, with an emphasis on system-level fault diagnosis.