Web crawler finds and indexes scientific documents, from which citation counts are extracted upon examining their contents. Scheduler schedules updating citation counts of individual researchers based on their mean citations, and optionally, importance factors, subject to a total update rate. (Fig. 1 from the paper)
Google Scholar, a citation index, crawls the web to find and index various items such as documents, images, videos, etc. As it focuses on scientific documents, Google Scholar further examines the contents of these documents to extract out citation counts for indexed papers. Google Scholar then updates citation counts of multitudes of individual researchers.
Google Scholar is, of course, resource-constrained; it cannot update all researchers all the time. If Google can update only a fraction of all researchers, how should it prioritize the updating process? Should it update researchers with higher mean citation rates more often, as their citation counts are subject to larger change per unit time? Or should it update researchers with lower mean citation rates more often to capture rarer, more informative changes?
In a new paper, Who Should Google Scholar Update More Often?, Professor Sennur Ulukus (ECE/ISR) and her graduate student Melih Bastopcu, model the citation count of each individual researcher as a counting process with a fixed mean. They use a metric similar to the age of information: the long-term average difference between the actual citation numbers and the citation numbers according to the latest updates. Ulukus and Bastopcu show that, to minimize this difference metric, the updater should allocate its total update capacity to researchers proportional to the square roots of their mean citation rates.
That is, more prolific researchers should be updated more often, but there are diminishing returns due to the concavity of the square root function. In a more general sense, the paper addresses the problem of optimal operation of a resource-constrained sampler that wishes to track multiple independent counting processes in a way that is as up to date as possible.
Researchers balance information quality and freshness in information update system design
Ephremides leads new NSF Age of Information project
Real-time remote reconstruction of signals for the Internet of Things
Optimal selective encoding for timely updates
Information theoretic approach to the private set intersection problem
New research on multi-information sources of multiphysics systems
Narayan is PI for NSF information-theoretic signal processing sampling research grant
Alum Ahmed Arafa to join UNC Charlotte faculty this fall
Sennur Ulukus is plenary speaker at Canadian Workshop on Information Theory
Narayan and students publish three articles in IEEE Transactions on Information Theory
February 24, 2020