Navigability of Decentralized Information Networks

FWF funded research project (P24866-N15)

The main goal of the project was to investigate the navigability of modern decentralized Web-based information networks from topological static perspective as well as from a dynamic perspective of a complex interplay between network topology and user navigational behaviour.

The goal of this project (project leader: Denis Helic, project Co-PI Markus Strohmaier) was to study navigability of large decentralized Web-based information networks such as Wikipedia or recommender systems. In other words, we were interested in detecting how difficult is to find relevant information by browsing and navigating such systems. The navigability of information networks is a complex property that depends on many factors such as the connectivity of the network, the existence of shortcuts which connect thematically distant parts of the network, the strategies that users adopt when navigating the networks, as well as design decisions that system operators make in their systems.

In our project we were firstly able to gain new insights in how users navigate on the Web. This enabled us to develop a unified approach for formalizing various aspects of this behaviour in the form of mathematical constructs such as transition matrices. This, in turn allowed us to study the interplay between user navigational behaviour and the structure of the information networks that they navigate. With this approach we are able to, for example, indentify potential problems in information network which could lead to poor support of users when they browse a given system. Moreover, with our approach we can analyze the consequences of possible modifications that system operators may adopt to remedy the problems in their systems. The application of the methods developed in this project aim at the discovery of ways how to better support humans in their navigation on the Web by automatic modification of the underlying networks.

The main goal of the project was to investigate the navigability of modern decentralized Web-based information networks from topological static perspective as well as from a dynamic perspective of a complex interplay between network topology and user navigational behaviour. We can divide the main project goal into three sub-goals:

  1. Analysis of user navigational logs
  2. Modelling of user navigational behaviour
  3. Effects of structural changes on the navigational properties of information networks

Analysis

We found out that the links in the lead sections of a Web page (i.e. the first section of a Wikipedia page or the infobox) are navigated substantially more often than the link counts in those sections would suggest. This indicates that these sections are considerably more important for users than some other sections, e.g. link section on the bottom of a Wikipedia page.

In general settings of free-form navigation the generality fails to explain user click choices. On the other hand, similarity to the next page is a fairly good fit for user navigation.

On aggregate, user navigation choices are best explained by the interplay between so-called exploitation and exploration phases. In the exploitation phase users tend to follow links greedily as to maximize their information benefit, whereas in the exploration phase they investigate the information space in the search for informative clusters to satisfy their information need in yet another exploitation phase. Our results suggest that the ratio between the magnitude of exploitation and exploration is around 4.

Modelling

Firstly, we worked with the models based on decentralized search. These algorithmic models are based on the assumption that users act greedily according to their background knowledge of the information space. We extended this model by sampling procedures to model various types of user knowledge such as generalist and specialist knowledge and by including random stochastic behaviour to account for the trade-off between exploitation and exploration phases.

Secondly, we developed models based on the adjacency matrix of the information network, where nodes represent Web pages and edges represent links between pages. In the next step, we modify the weights of links in the adjacency matrix to reflect topological properties of the information network, layout properties of individual pages, user background knowledge and navigational behaviour, or external factors such as search engines.

This matrix reflects both: the static topological properties of information networks and their projections given by the page organization and the user biases, as well as the dynamics properties of the user navigation and its interplay with the structure.

Computational Consequences

By calculating the standard quantities of the adjacency (transition) matrix we gain insight into the consequences of potential decisions made by the system operators. For example, in our experiments with recommender networks we have discovered that although such networks possess structural properties (such as connectedness and small-world properties) that support efficient navigation in theory, as soon as the dynamics of the process is taken into account these networks fail to support users in their navigational endeavours. The reason for this poor support is an insufficient linking between various topical clusters.

Further, we have investigated the effects of inducing navigational biases and the effects of insertion of new links into information networks. We found that both approaches have to deal with trade-offs between the desired effects and undesired side effects. For example, inducing bias increases visibility of biased nodes but can drain the visibility from other pages. On the other hand, inserting new links towards less visible pages can improve their visibility but can also break the semantic coherence of the given Web pages as links towards unrelated pages are inserted.

  1. Daniel Lamprecht, Kristina Lerman, Denis Helic, Markus Strohmaier. How the structure of Wikipedia articles influences user navigation. New Review of Hypermedia and Multimedia, 23(1): 29-50 2016.
  2. Daniel Lamprecht, Markus Strohmaier, Denis Helic. A Method for Evaluating the Navigability of Recommendation Algorithms. 5th International Workshop on Complex Networks and their Applications, Complex Networks 2016, 247-259, Springer 2016.
  3. Florian Geigl, Simon Walk, Markus Strohmaier, Denis Helic. Steering the Random Surfer on Directed Webgraphs. Accepted at IEEE/WIC/ACM International Conference on Web Intelligence, 2016.
  4. Daniel Lamprecht, Dimitar Dimitrov, Denis Helic, Markus Strohmaier. Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions. 12th International Symposium on Open Collaboration, OpenSym 2016, 17:1-17:10, ACM, 2016.
  5. Stefan John, Michael Granitzer, Denis Helic. Creation of Navigation Hierarchies for Information Networks by Applying Genetic Algorithms. 12th International Conference on Web Information Systems and Technologies, WEBIST, pages 63-74, 2016. (Runner-Up Best Paper).
  6. Florian Geigl, Kristina Lerman, Simon Walk, Markus Strohmaier, Denis Helic. Assessing the Navigational Effects of Click Biases and Link Insertion on the Web. 27th ACM Conference on Hypertext and Social Media, HT 2016, pages 37-47, New York, NY, USA, 2016. ACM.
  7. Daniel Lamprecht, Markus Strohmaier, Denis Helic, Csongor Nyulas, Tania Tudorache, Natalya F. Noy and Mark A. Musen. Using ontologies to model human navigation behavior in information networks: A study based on Wikipedia. Semantic Web Journal., 6(4):403-422, 2015.
  8. Daniel Lamprecht, Denis Helic and Markus Strohmaier. Quo vadis? On the effects of Wikipedia?s policies on navigation. In Wikipedia, a Social Pedia: Research Challenges and Opportunities, co-located with the 9th The International AAAI Conference on Web and Social Media (ICWSM), pages 64-66, AAAI, 2015.
  9. Daniel Lamprecht, Florian Geigl, Tomas Karas, Simon Walk, Denis Helic, Markus Strohmaier. Improving recommender system navigability through diversification: a case study of IMDb. International Conference on Knowledge Management, pages 21:1-21:8, 2015.
  10. Florian Geigl, Daniel Lamprecht, Reiner Hofmann-Wellenhof, Simon Walk, Markus Strohmaier, Denis Helic. Random surfers on a web encyclopedia. International Conference on Knowledge Management, pages 5:1-5:8, 2015.
  11. Florian Geigl, Denis Helic. The Role of Homophily and Popularity in Informed Decentralized Search. Proceedings of the 2nd International Workshop on Dynamic Networks and Knowledge Discovery, co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2014), DyNaK 2014, CEUR Workshop Proceedings 1229, 2014.
  12. Denis Helic. Regular equivalence in informed network search. In 2014 Proceedings of the 37th International MIPRO Convention}, Mipro 2014, pages 1088-1093, IEEE, 2014.
  13. Denis Helic, Markus Strohmaier, Michael Granitzer, Reinhold Scherer: Models of human navigation in information networks based on decentralized search. HT '13 Proceedings of the 24th ACM Conference on Hypertext and Social Media, Page(s), 89-98, 2013.
  14. Denis Helic, Markus Strohmaier, Weronika Wojcik: Navigational evaluation of breadth first search spanning trees. Information and Communication Technology Electronics and Microelectronics (MIPRO) (2013), Page(s) 998-1003 International Convention on Information and Communication Technology, Electronics and Microelectronics, 2013.