A STUDY ON THE EVOLUTION OF THE WEB

 

Alexandros Ntoulas, Junghoo Cho: University of California Los Angeles (UCLA), Los Angeles, CA 90095, USA

Hyun Kyu Cho, Hyeonsung Cho, and Young-Jo Cho: Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-Dong, Yuseong-Gu, Daejeon, 305-350, Republic of Korea

 

Abstract

 

We seek to gain improved insight into how Web search engines should cope with the evolving Web, in an attempt to provide users with the most up-to-date results possible. For this purpose we collected weekly snapshots of some 150 Web sites over the course of one year, and measured the evolution of content and link structure. Our measurements focus on aspects of potential interest to search engine designers: the evolution of link structure over time and the rate of creation of new pages on the Web. Our findings indicate a rapid turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even higher rate of turnover in the hyperlinks that connect them. We conclude the paper with a discussion of the potential implications of our results for the design of effective Web search engines.

 

Short Biography

 

Junghoo Cho is an assistant professor in the Department of Computer Science at University of California, Los Angeles. He received a Ph.D. degree in Computer Science from Stanford University in 2002 and a B.S. degree in physics from Seoul National University in 1996. His main research interests are in the study of the evolution, management, retrieval and mining of the World-Wide Web. He has published more than 30 research papers in international journals and major peer-reviewed conference proceedings. He is a recipient of the NSF CAREER Award and IBM Faculty Award and serves on program committees of top international conferences such as SIGMOD, VLDB, WWW and ICDE.