Content:
- Paper
- Presentation
- Explore the results
- H2vis online demo
- Modified H2O source code
- All graphs and graph generation sources
- Webpage test corpus urls
Test corpus urls
Download the locally cached resources for these urls (downloaded to local cache between March and June 2017).The sites presented in the publication are as follows: (n = 40, chosen from alexa.com and moz.com top 500)
Low-weight pages ( n=10 ) ( <= 500KB )
- facebook.com
- google.com
- w3.org
- wikipedia.org
- wordpress.com
- gnu.org
- apache.org
- opera.com
- gravatar.com
- phpbb.com
Medium-weight pages ( n=10 ) ( > 500KB, <= 1000KB )
- bit.ly
- dotdash.com
- gov.uk
- reddit.com
- statcounter.com
- ed.gov
- spotify.com
- columbia.edu
- nature.com
- sciencedirect.com
Heavy-weight pages ( n=20 ) ( > 1000KB )
- msn.com
- canvas.be
- cnet.com
- demorgen.be
- etsy.com
- github.com
- harvard.edu
- imdb.com
- imgur.com
- nytimes.com
- telegraph.com
- vtm.be
- youtube.com
- pinterest.com
- joomla.com
- academia.edu
- sciencemag.org
- researchgate.net
- intel.com
- internal project webpage ("vodlib")