A STUDY ON THE EVOLUTION OF THE WEB
Alexandros
Ntoulas, Junghoo Cho: University of California Los
Angeles (UCLA), Los Angeles, CA 90095, USA
Hyun Kyu Cho, Hyeonsung Cho, and Young-Jo Cho: Electronics and
Telecommunications Research Institute (ETRI), 161 Gajeong-Dong, Yuseong-Gu,
Daejeon, 305-350, Republic of Korea
Abstract
We seek to gain improved insight into how Web search
engines should cope with the evolving Web, in an attempt to provide users with
the most up-to-date results possible. For this purpose we collected weekly
snapshots of some 150 Web sites over the course of one year, and measured the evolution
of content and link structure. Our measurements focus on aspects of potential
interest to search engine designers: the evolution of link structure over time
and the rate of creation of new pages on the Web. Our findings indicate a rapid
turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even
higher rate of turnover in the hyperlinks that connect them. We conclude the paper
with a discussion of the potential implications of our results for the design
of effective Web search engines.
Short Biography
Junghoo Cho is an assistant professor in the Department of
Computer Science at University of California, Los Angeles. He received a Ph.D.
degree in Computer Science from Stanford University in 2002 and a B.S. degree
in physics from Seoul National University in 1996. His main research interests
are in the study of the evolution, management, retrieval and mining of the
World-Wide Web. He has published more than 30 research papers in international
journals and major peer-reviewed conference proceedings. He is a recipient of
the NSF CAREER Award and IBM Faculty Award and serves on program committees of
top international conferences such as SIGMOD, VLDB, WWW and ICDE.