The main goal of the project was to investigate the navigability of modern decentralized Web-based information networks from topological static perspective as well as from a dynamic perspective of a complex interplay between network topology and user navigational behaviour.
The goal of this project (project leader: Denis Helic, project Co-PI Markus Strohmaier) was to study navigability of large decentralized Web-based information networks such as Wikipedia or recommender systems. In other words, we were interested in detecting how difficult is to find relevant information by browsing and navigating such systems. The navigability of information networks is a complex property that depends on many factors such as the connectivity of the network, the existence of shortcuts which connect thematically distant parts of the network, the strategies that users adopt when navigating the networks, as well as design decisions that system operators make in their systems.
In our project we were firstly able to gain new insights in how users navigate on the Web. This enabled us to develop a unified approach for formalizing various aspects of this behaviour in the form of mathematical constructs such as transition matrices. This, in turn allowed us to study the interplay between user navigational behaviour and the structure of the information networks that they navigate. With this approach we are able to, for example, indentify potential problems in information network which could lead to poor support of users when they browse a given system. Moreover, with our approach we can analyze the consequences of possible modifications that system operators may adopt to remedy the problems in their systems. The application of the methods developed in this project aim at the discovery of ways how to better support humans in their navigation on the Web by automatic modification of the underlying networks.
The main goal of the project was to investigate the navigability of modern decentralized Web-based information networks from topological static perspective as well as from a dynamic perspective of a complex interplay between network topology and user navigational behaviour. We can divide the main project goal into three sub-goals:
We found out that the links in the lead sections of a Web page (i.e. the first section of a Wikipedia page or the infobox) are navigated substantially more often than the link counts in those sections would suggest. This indicates that these sections are considerably more important for users than some other sections, e.g. link section on the bottom of a Wikipedia page.
In general settings of free-form navigation the generality fails to explain user click choices. On the other hand, similarity to the next page is a fairly good fit for user navigation.
On aggregate, user navigation choices are best explained by the interplay between so-called exploitation and exploration phases. In the exploitation phase users tend to follow links greedily as to maximize their information benefit, whereas in the exploration phase they investigate the information space in the search for informative clusters to satisfy their information need in yet another exploitation phase. Our results suggest that the ratio between the magnitude of exploitation and exploration is around 4.
Firstly, we worked with the models based on decentralized search. These algorithmic models are based on the assumption that users act greedily according to their background knowledge of the information space. We extended this model by sampling procedures to model various types of user knowledge such as generalist and specialist knowledge and by including random stochastic behaviour to account for the trade-off between exploitation and exploration phases.
Secondly, we developed models based on the adjacency matrix of the information network, where nodes represent Web pages and edges represent links between pages. In the next step, we modify the weights of links in the adjacency matrix to reflect topological properties of the information network, layout properties of individual pages, user background knowledge and navigational behaviour, or external factors such as search engines.
This matrix reflects both: the static topological properties of information networks and their projections given by the page organization and the user biases, as well as the dynamics properties of the user navigation and its interplay with the structure.
By calculating the standard quantities of the adjacency (transition) matrix we gain insight into the consequences of potential decisions made by the system operators. For example, in our experiments with recommender networks we have discovered that although such networks possess structural properties (such as connectedness and small-world properties) that support efficient navigation in theory, as soon as the dynamics of the process is taken into account these networks fail to support users in their navigational endeavours. The reason for this poor support is an insufficient linking between various topical clusters.
Further, we have investigated the effects of inducing navigational biases and the effects of insertion of new links into information networks. We found that both approaches have to deal with trade-offs between the desired effects and undesired side effects. For example, inducing bias increases visibility of biased nodes but can drain the visibility from other pages. On the other hand, inserting new links towards less visible pages can improve their visibility but can also break the semantic coherence of the given Web pages as links towards unrelated pages are inserted.