Technology

How to Stop Search Engines from Crawling your Website

Disclosure: When you purchase a service or a product through our links, we sometimes earn a commission.

This blogpost comes in my mind when I deleted few pages from Google and my intention was not to make them publicly accessible. So I thought there are people out there like me who doesn’t want to Google to index their pages.

Reasons to Block Google from Indexing a Page

  1. Prevent your Thank You page for indexing
  2. Reduce pages with are thin or having low-quality content
  3. Removing duplicate content

How to restrict Google to index your pages

Here are few general rules that you can use to prevent google from indexing your site content:

  • Using Noindex tag-

The noindex tag to prevent search engines from indexing a page looks like this:

<meta name="robots" content="noindex">

If you’re only worried about preventing Google from indexing a page, you can use the following code:

<meta name="googlebot" content="noindex">

For wordpress users, you can use Yoast and Rankmath plugin to do this exercise.

  • Disallow Bots in Your Robots.txt

If you want to be sure that bots like Googlebot and Bingbot can’t crawl your pages at all, you can add directives to your robots.txt file.

This is how you can find your website robots.txt file.

On your website:

http://example.com/robots.txt

On your server:

/home/userna5/public_html/robots.txt
  • By adding a X-Robots-Tag HTTP header

If your site is running on Apache, and mod_headers is enabled (it usually is), you could add the following single line to your .htaccess file.

This allows you to specify an HTTP header called X-Robots-Tag and set the value as you would the meta robots tags value.

Header set X-Robots-Tag "noindex, nofollow"

And this would have the effect that that entire site can be indexed. But would never be shown in the search results.

Stop Bots from Crawling Your Site with .htaccess

The code to block Googlebot only would look like this:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot).*$ [NC]

RewriteRule .* - [F,L]

If you want to block several bots at a time, you can set your code up like this:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Bingbot|Baiduspider).*$ [NC]

RewriteRule .* - [F,L]

Set a crawl delay for all search engines:

To avoid, high system resource usage for sites having large number of pages, You should set crawl to avoid such issues, This is how you can do that:

User-agent: * 
Crawl-delay: 30

Disallow one particular search engines from crawling website:

You can disallow just one specific search engine from crawling your website, with these rules:

User-agent: Baiduspider 

Disallow: /

Disallow all search engines from particular folders:

If we had a few directories like /cgi-bin/, /private/, and /tmp/ we didn’t want bots to crawl we could use this:

User-agent: * 

Disallow: /cgi-bin/ 

Disallow: /private/ 

Disallow: /tmp/


Disallow all search engines from particular files:

If we had files like contactus.htm, index.htm, and store.htm we didn’t want bots to crawl we could use this:

User-agent: *

Disallow: /contactus.htm

Disallow: /index.htm 

Disallow: /store.htm


Disallow all search engines but one:

If we only wanted to allow Googlebot access to our /private/ directory and disallow all other bots we could use:

User-agent: * 

Disallow: /private/  

User-agent: Googlebot 

Disallow:

Suggested-

Don't miss out!
Learn How to 10X Your Blog Traffic
Invalid email address
Give it a try. You can unsubscribe at any time.

About the author

Rahul Setia

Rahul Setia was born and raised in the Kaithal, Haryana. He worked at brands like Jabong, ProProfs etc. He was also in the List of Top 100 Social Media Influencer's 2019 by Status Brew. He lives in Delhi/NCR and is a Digital Gig & Founder of Websites i.e TechBlogCorner.com, ViralMasalla.com, DealorCoupons.com.
Follow me on: LinkedIn, @rahulsetia007 and Facebook.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.