ARN

Huge number of websites barely visited, report finds

The 'long tail' is cold and dark.

The Internet, famously, has a long tail, but a new analysis has revealed another characteristic of this vast slew of obscure websites. Huge numbers of them are never visited.

Analysing visits to several million websites during the last quarter of 2009 for its State of the Web report (registration required), cloud security startup Zscaler created a Hilbert curve-generated 'heatmap' of active and inactive IPv4 sites from real customer data. As expected, the grid that emerged from this showed clusters of active sites as white dots, a large volume of reserved or non-routed addresses in gray, but it was the sea of dark that loomed largest of all.

In the three months of the analysis, vast numbers of sites were not visited at all, and on the assumption that Zscaler's customers are typical of Internet users more generally, these are Internet's lost continent of sites nobody ever visits, or visit so infrequently that it doesn't register.

"It's a fascinating view which exposes just how vast the Internet truly is. Even when analyzing traffic from millions of users over the course of three months, it can be seen that much of the Internet remains untouched," say the authors.

Commentators often refer to the 'dark side of the web', meaning the criminal and unsavoury parts of the Internet few normally look closely at, but what Zscaler has turned up on its map is dark in a more literal sense. Nobody looks at these sites or if they do it is incredibly hard to detect from the US cloud.

Some of this 'unlit space' could, of course, be non-English speaking domains beyond the ken of Zscaler's customer base, which raises the possibility that there are several 'long dark tails' on the Internet which depend from which point you measure the phenomenon.

Part of the explanation for what does not get visited in Zscaler's report might also be explained in relation to what does.

According to the company, even half a decade ago the web was just that, a space defined by html files. Although many persist on seeing the web in this way, the file types moving across its servers have changed markedly. Now, more than half of such files are Jpegs or Gifs, with html files accounting for only 0.57 percent of files.

Popular domains also dominate the Internet, hovering up more and more of people's attention span. Liveperson, Google, doubleclick (the web ad distribution network), Yahoo, Facebook, and a clutch of less well known but structurally important web domains took a large percentage of all web visits, a sign that the web is becoming more concentrated on fewer locations. This is the part of the Internet that is growing.

Tellingly, a similar story of concentration is seen in terms of malware hosts, though with considerable fluctuations. Depending on the particular type of scam being looked at, huge number of malicious URLs emanate from a very small number of hosts. Whether botnets, phishing websites, or malware servers, there is usually a single mega-source, one or two large sources, and a large number of sources with extremely small shares.

Come socialise with us! Facebook | LinkedIn

More about: etwork, Facebook, Google, Yahoo
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the ARN comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Whitepapers
Latest Stories
Community Comments
ARN Directory | Distributors relevant to this article
Aquion , Avnet Technology Solutions , Ingram Micro Australia
rhs_login_lockGet exclusive access to ARN's news, research and invitation only events.
ARN Distributor Directory
ARN Vendor Directory

iAsset is a channel management ecosystem that automates all major aspects of the entire sales,marketing and service process, including data tracking, integrated learning, knowledge management and product lifecycle management.