Website Cloaking: What The Search Engines Allow and Disallow
According to Google cloaking is when "a website returns altered web pages to search engines crawling the site. In other words, a human reading the site would see some different content or information than the Googlebot or other search engine robot reading the site". The reason people implement cloaking is to improve their search engine ranking by misleading the search engine robot into thinking the content on the page is different than what it really is. While cloaking does not necessarily turn people away from your website, it can be more damaging as search engines frown seriously at it. Some kind of lying to the search engines and given them a misleading representation.
How Search Engines Respond to Cloaking
Most search engines will wield a very heavy penalty on any website found to be engaged in cloaking. Most search engines will immediately delete and sometimes blacklist a site that is discovered to be cloaking. They do this because cloaking is usually intended to completely fool or manipulate; the search engine's algorithms and programming that determine what makes a site rank high or low in that engine.
If the page that the customer sees is different from the page that the search engine bot sees, then the search engine cannot do its job. So they ban sites that use cloaking strategies in an attempt to gain ranking.
Personalization and Cloaking
The key question here is to determine if website personalization amounts to Cloaking.
New and advanced websites display specialized content which depends on various factors and determined by the customers themselves. For example, on About, if you haven't visited the site in several months, you might get different content displayed in the navigation menus than you would if you regularly visited the site. Other sites use a technique called &quot;Geo-IP which determines your location based on the IP address with which you are accessing the site or with which you have logged in and displays ads or weather information relevant to your part of the world or country.
Some argue in favour of this practice as cloaking positing that this kind of personalization displays and delivers different to a customer than what is delivered to the search engine robot. But the robot receives the same type of content as the customer, just personalized (if you will) to that robot's locale or profile on the system.
Also read:: SEO - What Search Engines Love and Hate About Websites
In this video, Google's Matt Cutts speaks about Cloaking'
Common Forms of Cloaking?
- User-Agent Cloaking
- IP based cloaking
- HTTP_REFERER cloaking
- HTTP Accept-language header cloaking
- User-Agent Cloaking: A user-agent is a program (a software agent) that operates on behalf of a user. Example, a web browser acts as user-agent that fetches website information on an operating system. When you key in a query, the browser sends a code to the server that will distinguish/identify the user-agent. If the user-agent is identified to be a crawler, cloaked content is served.
- IP-based cloaking: Every user accessing a website has an IP address based on their location and internet service. In this, the users are redirected to the desired page through a page with good SERP ranking and high traffic volume. For this, you can use the reverse DNS records (available in the cPanel of your hosting company) to identify the IP address and set up .htaccess to redirect them. This is the most preferred method of cloaking.
- HTTP_REFERER cloaking: In this method, the HTTP REFERER header of the requester is checked and based on that, a cloaked or uncloaked version of the website is served.
- HTTP Accept-language header cloaking: This technique checks the HTTP Accept-Language header of the user and based on the match result, a specific version of the website is presented. In simple terms, if the HTTP Accept-Language header is of a search engine, then a cloaked version of the website is served.
What are some common Cloaking Practices?
Webmasters use a combination of techniques to present different information to users and search engine crawlers. We will look at these practices.
1. Invisible or Hidden text
Using this approach, webmasters can stuff keywords, overwriting content in a way that is invisible or hidden to the users. In most cases, webmasters use text color to keep content away from users but search engines crawl and detect the content. For example, if a text is written in white color and displayed on a white background, the human eyes cannot see it but search engines do. Such texts can be used to stuff keywords into web page content without the knowledge of human users.
2. Flash Based Websites
The use of flash, as you may know, is today not a recommended technique to display content as far as SEO is concerned. Notwithstanding, lots of websites are still built on flash. Some of the websites cannot be converted into something better and what the owners do is that in order to feed search engines with content from the websites, they write content rich web pages and provide them to search engine crawlers and flash pages to visitors.
3. E-mail Cloaking
In e-mail distribution, cloaking is the act of masking the name and address of the sender so that the recipient does not know who sent the e-mail.
4. HTML Rich Websites
Good SEO practice recommends having TEXT to HTML ratio as high as possible. In other words, your web page should have more text (content) as compared to your HTML tags. But if anyone is writing short articles or posts, your text to HTML ratio will be very low. To avoid redesigning a website, some webmasters choose cloaking to meet SEO guidelines
5. Image Gallery Websites
Because web crawlers do not scan images, websites with Image gallery websites with more images than the actual content on their pages try to use cloaking to get top placement for relevant keywords.
Can There be Acceptable Cloaking Practices
Generally speaing, this is something to be very careful about. Google''s Matt Cutts warns that:
“White hat cloaking is a contradiction in terms of Google. We’ve never had to make an exception for “white hat” cloaking. If someone tells you that — that’s dangerous.”
He further says that if any site includes a code that differentiates the Googlebot on the basis of the user agent or IP address, Google considers it as cloaking and may take action against the site.
However, we should mention some exceptions which are not considered harmful. some practices are allowed despite presenting different content to search engines and users. Some content delivery methods now acceptable to Google that might have once been considered cloaking include:
- Geo-IP Location
- In Geo-IP location cloaking, content is presented to different users based on their location. If you take the case of Google itself, you will find that if you do a search for SEO from your location (country), it will present a different result from when you do the same search from another country. This is not considered as bad cloaking. It is done to provide a better experience to users so perfectly fine for any webmaster to do on their website.
- First Click free, Users clicking from Google to a listed page can read the page without having to pay or register with the hosting site. You let Googlebot through as if it were a registered member and also allow anyone coming from Google's search listings through.
- URL Rewriting. This practice includes the removal of unnecessary parameters and other URL (URL cloaking). It is a perfectly acceptable practice to present a different URL for a particular content to make more search engine friendly.
Also read: Useful URL Structuring Tips That Will Help Higher Ranking
What is Right and Legitimate?
The difficulty is that the judgement is made by machines and not humans. The intention is fundamental to guide our actions. If your intention is to help users relate with your site and not to mislead the search engines, then you will consider your actions to be legitimate. Generally, we recommend you avoid all the practices we have spelt out as bad above because Google considers all forms of Cloaking as unacceptable.