Fachbereich Informatik

DFG-Project: WWW Analysis and Visualization

A research project of the Schwerpunkt Programm Nr. 1126: Algorithmik großer und komplexer Netzwerke

Sitemap Generation

As the work of a thesis, a prototype for the automated generation of sitemaps was implemented. The system supports structure and usage analysis of an site for its sitemap generation. More detailed information is available at the project's page.

 

Clustering of the Web Graph

The web graph is the graph you get, if you take each web page as node and each hyperlink as edge of the graph. This link structure, as being manually generated by humans, carries a lot of information about the topic relationship between the web pages.

As the web graph is very large, with billions of nodes and edges, an mechanism must be found to extract information from such a huge amount of data. An obvious approach now is to sort the pages by topic, like the way i.e. yahoo does. To do this, large sub graphs of the web graph need to be retrieved, that are very strong related. This process is usually called clustering.

Unfortunately all clustering problems of interest are NP-complete. Therefore heuristics and approximation algorithms need to be used. 

The goal of this part of the project is, to first find a good clustering problem for extracting topic relationships and to efficiently compute it on the web graph. If possible, special properties of the web graph (spareness, power law distributions) should be used.

People