(Translated from the original Italian)
The Deep Web (or Invisible web) is the set of information resources on the World Wide Web not reported by normal search engines.
According several researchers, the principal search engines index only a small portion of the overall web content, the remaining part is unknown to the majority of web users.
What do you think if you were told that under our feet, there is a world larger than ours and much more crowded?
We would literally be shocked, and this is the reaction of those individual who can understand the existence of the Deep Web, a network of interconnected systems having a size hundreds of times higher than the current web, around 500 times larger.
"Very exhaustive" is the definition provided by the founder of BrightPlanet, Mike Bergman, that compared searching on the Internet today to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deeper and therefore missed.
Ordinary search engines to find content on the web use software called "crawlers". This technique is ineffective for finding the hidden resources of the Web that could be classified into the following categories:
- Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
- Unlinked content: pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks).
- Private Web: sites that require registration and login (password-protected resources).
- Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
- Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headers which prohibit search engines from browsing them and creating cached copies).
- Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.
- Text content using the Gopher protocol and files hosted on FTP that are not indexed by most search engines. Engines such as Google do not index pages outside of HTTP or HTTPS.
A parallel web that has a much wider amount of information represents an invaluable resource for private companies, governments, and especially for cybercriminals. In the imagination of many persons, the Deep Web is a term associated with the concept of anonymity that goes well with criminal intents as they cannot be pursued because they are submerged in an inaccessible world.
As we will see, this interpretation of the Deep Web is quite wrong, as we are faced with a network that is definitely different from the usual web, but in many ways repeats the same issues in a different sense.
What is a Tor? How to preserve the anonymity?
Tor is the acronym of "The onion router", a system implemented to enable online anonymity. Tor client software routes Internet traffic through a worldwide volunteer network of servers hiding user's information eluding any activities of monitoring.
As usually happens, the project was born in military sector, sponsored the US Naval Research Laboratory and from 2004 to 2005, and it is supported by the Electronic Frontier Foundation.
Actually the software is under the development and maintenance of Tor Project. If a user that navigates using Tor, it's difficult to trace them, ensuring their privacy because the data is encrypted multiple times and passes through multiple nodes, or Tor relays, of the network.
Connecting to the Tor network
Imagine a typical scenario where Alice desires to be connected with Bob using the Tor network. Let’s see step by step how it is possible:
She makes an unencrypted connection to a centralized directory server containing the addresses of Tor nodes. After receiving the address list from the directory server the Tor client software will connect to a random node (the entry node), through an encrypted connection. The entry node would make an encrypted connection to a random second node which would in turn do the same to connect to a random third Tor node. The process goes on until it involves a node (exit node) connected to the destination.
Consider that during Tor routing, in each connection, the Tor node are randomly chosen and the same node cannot be used twice in the same path.
To ensure anonymity, the connections have a fixed duration. Every ten minutes to avoid statistical analysis that could compromise the user’s privacy, the client software changes the entry node.
Up to now we have considered this an ideal situation in which a user accesses the network only to connect to another. To further complicate the discussion, in a real scenario, the node Alice used could in turn be used as a node for routing purposes with other established connections between other users.
A malevolent third party would not be able to know which connection is initiated as a user and which as node is making impossible the monitoring of the communications.
(click image to enlarge)
After this necessary parenthesis on Tor network routing we are ready to enter the Deep Web simply using the Tor software from the official web site of the project.
Tor is able to work on all the existing platforms and many add-ons make simple the integration in existing applications, including web browsers. Despite that, the network has been projected to protect user’s privacy, and to be really anonymous it's suggested to go though a VPN.
A better mode to navigate inside the Deep Web is to use the Tails OS distribution which is bootable from any machine and won't leaving a trace on the host. Once the Tor Bundle is installed it comes with its own portable Firefox version, ideal for anonymous navigation due an appropriate control of installed plugins, in the commercial version in fact common plugins could expose your identity.
Once inside the network, where it possible to go and what is it possible to find?
Well once inside the Deep Web, we must understand that navigation is quite different from the ordinary web, as every research is more complex due the absence of indexing of the content.
A user that starts their navigation in the Deep Web have to know that a common way to list the content is to adopt collection of Wikis and BBS-like sites which have the main purpose of aggregating links and categorizing them in more suitable groups.
Another difference that user has to take in to consideration is that instead of classic extensions (e.g. .com, .gov) the domains in the Deep Web generally end with the .onion suffix.
Following a short list of links that have made famous the Deep Web published on Pastebin:
(click image to enlarge)
Cleaned Hidden Wiki should be a also a good starting point for the first navigations:
Be careful, some content is labeled with commonly used tags such as CP= child pxxn, PD is pxxophile, so stay far from them.
The Deep Web is considered the place where everything is possible, you can find every kind of material and services for sale, but most of them are illegal. The hidden web offers cybercrime great business opportunities, hacking services, malware, stolen credit cards, weapons, etc.
We all know the potentiality of the e-commerce in ordinary web and its impressive growth in last couple of years, well now imagine the Deep Web market that is more than 500 times bigger and where there is no legal limits on the goods to sell. We are faced with an amazing business controlled by cyber criminal organizations.
Speaking of the black market, we cannot avoid mentioning the Silk Road web site, an online marketplace located in the Deep Web where the majority of its products are derived from illegal activities. Of course it's not the only one, many other markets are managed to address specify products, and believe me, many of them are terrifying.
(click image to enlarge)
Most transactions on the Deep Web accept the BitCoin system for payments allowing the purchase of products while preserving the anonymity of the transaction, encouraging the development of trade in respect to f illegal activities. We are faced with a with an autonomous system that is of advantage to criminal activities, while ensuring the anonymity of transactions and the inability to track down the criminals.
But is it really all anonymous? Is it possible to be traced in the Deep Web? What is the position of the governments towards the Deep Web?
I will provide more information on the topic in the next articles... in meantime let me thank a great expert of the Deep Web, "The gAtOmAlO" with whom I collaborated on a project which we will present you soon.
Cross-posted from Security Affairs