The web graph is the graph you get, if you take each web page as node and each hyperlink as edge of the graph. This link structure, as being manually generated by humans, carries a lot of information about the topic relationship between the web pages.
As the web graph is very large, with billions of nodes and edges, an mechanism must be found to extract information from such a huge amount of data. An obvious approach now is to sort the pages by topic, like the way i.e. yahoo does. To do this, large sub graphs of the web graph need to be retrieved, that are very strong related. This process is usually called clustering.
Unfortunately all clustering problems of interest are NP-complete. Therefore heuristics and approximation algorithms need to be used.
The goal of this part of the project is, to first find a good clustering problem for extracting topic relationships and to efficiently compute it on the web graph. If possible, special properties of the web graph (spareness, power law distributions) should be used.