How do I set up Clean-Param for a robots.txt file on Blogger?

 

I came across on one of my blogs on the Blogger platform with the appearance of duplicate pages with GET parameters. This happened after changing the default template to a third-party one. In the Webmaster (Yandex) section along the way:

Diagnostics -> Site Diagnostics

Messages about these pages appeared:


Yandex recommends using the Clean-param directive in robots.txt so that the robot ignores insignificant GET parameters and combines all signals from the copy pages on the main page.

If you don't have time to figure it out on your own, or you understand that you can't do it yourself, find a specialist who will help you resolve the issue. Secure deals and affordable prices.

How do I set my robots.txt file on Blogspot?

Select your blog in the Blogspot account (1.) and go to the "Settings" section (2.). Scroll down the Settings page and find Crawlers and Indexing. Switch the slider (3.) opposite the "Use your own robots.txt file" parameter and click on the space (4.) under the "Custom robots.txt file" parameter.


After that, a small window will open in which we need to specify our settings (1.) for the robots.txt file on Blogspot. After making the settings, do not forget to save them (2.).

How do I set up Clean-Param for a robots.txt file on Blogger?

And here's the problem. I came across the fact that when I tried to specify Clean-Param in the robots.txt file for Blogger, when I tried to save the changes I made, I received an error message:

The content of the robots.txt file does not follow the formatting rules.

This is most likely due to the fact that Google does not recognize Clean-param. For this task, Google provides a "URL Parameters" tool .

However, without a clear understanding of what you will be doing, I strongly advise against using this tool.

In our case, with the Clean-param setting as recommended by Yandex, we can solve by closing URLs with parameters from indexing. To do this, add the following data to your robots.txt file:

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Disallow: /*?
Allow: /

Sitemap: https://имя_вашего_блога.blogspot.com/sitemap.xml

You can check your robots.txt file in Google Search Console . Add your robots.txt and provide a URL with a parameter. Check the result, the URL should be blocked from indexing:

Now check if your url is indexed without parameters:

Similarly, we check the result of the robots.txt file for Blogger in Yandex Webmaster . To do this, in the Webmaster's office, select your site and go to the "Tools" section (1.) to the "Analysis robots.txt" parameter (2.). Click "Check" (3.) to determine the file settings

After that, scroll down the page and indicate our duplicate pages with GET parameters, after which, click "Check" (1.). All links with GET parameters must be closed from indexing in robots.txt (2.):

Similarly, we check pages without GET parameters. As a result, the pages must be open for indexing:

After these manipulations and settings of the robots.txt file for the Blogger, go to "Site problems" (along the path "Diagnostics" -> "Site diagnostics") and opposite the problem, click "Check".

After a while, we should get a positive result. In the "Site Diagnostics" section, the critical error message should disappear:

Flaws.

It was noticed that after the creation of this robots.txt file, Google Search Console reports in the "Coverage" section that some pages have been indexed, despite the ban on indexing in robots.txt:


When checking any link marked "Warning " and type " Indexed despite being blocked in the robots.txt file ", we will see a message that the page is in the index, although the URL is blocked from indexing in the robots.txt file:

Upon closer inspection, you can see that indexing problems arise where the page was indexed by a mobile spider (Googlebot-Mobile):


This page is marked as blocked by robots.txt for crawling. However, it is still allowed for indexing.

This is most likely due to the fact that the mobile Googlebot-Mobile, went to the link with the GET parameter 

? m = 1

Since we blocked all such GET parameters in our robots.txt, we get a similar error. It is also noteworthy that if we check the same URL in the robots.txt checker for the availability of the page for the availability of Googlebot-Mobile, we will not receive an error:


At the same time, our link is without GET parameters. As soon as we check the link with the GET parameter
? M = 1, we will get an indexing lock. That's right, that's exactly what we wanted to avoid duplicate pages. Most likely, mobile Googlebot-Mobile visits the site from mobile devices.

Conclusion.

Although the Blogger platform does not provide the ability to specify the Clean-param directive in its robots.txt, we can use a workaround and fix the problem on the Blogger blog with duplicate pages that have GET parameters.

The method is working and universal, since it allows you to exclude duplicate pages by using the robots.txt settings for both the Yandex search engine and the Google search engine.

At the same time, there are problems with crawling pages by the Googlebot-Mobile bot. This leads to the appearance of warnings in the Google Search Console in the "Coverage" section that some pages have been indexed, despite the ban in robots.txt. Unfortunately, I am not aware of any solutions to this problem. If you find a solution for Blogger, share it in the comments.

Another problem is that if you ignore the problem with errors in GET parameters in Yandex. Webmaster, you risk losing the Site Quality Index (ICS). My site was downgraded by 10. As soon as the problem with the GET parameters was fixed, IKS returned back.

Whether or not to use this solution on your Blogger blog depends on what audience your site is intended for. When getting traffic from Google, it may not be worth using this solution. If your main traffic is from Yandex, of course, you cannot ignore errors with GET parameters, as this will negatively affect the positions of your site. In any case, use this option ONLY if you have problems with GET parameters. If you do not have them (errors are not displayed in Yandex Webmaster), leave the default robots.txt file and do not change anything!

And remember that if finances allow, it is better to find a specialist .

Added on 07.11.

In my Google search, pages without a description began to appear, pointing to Google Help .


After checking Yandex Webmaster and making sure that duplicate pages were gone, it was decided to disable the above robots.txt, due to problems with the indexing of the site by the Google mobile bot. At the same time, I did not have the canonical links on the pages indicated in the blog template. I don’t even know how it happened. In this regard, I wrote the code into the template:

<link expr:href='data:blog.url.canonical' rel='canonical'/>

To add the above code, go to " Topic " (1.) and click on the drop-down (2.) list (opposite " CUSTOMIZE " ):

In the drop-down list, select " Change HTML " :

And paste the above code (2.), after <head> (1.):


This will allow us to tell the search engines that all data: blog.url (links) are canonical. 

After, we are looking for everything 

data:blog.url

in the code and add

.canonical

To do this, press CTRL -> F in the code editor and in the search field that appears (1.), indicate:

data:blog.url

Press Enter and in the results found, add:

.canonical

After making the changes, do not forget to save the edits:

 
As a result, our pages should show the canonical address, without GET parameters ( ? M = 1 ,? M = 0,? Comments_89917  etc. ):

I also discovered that in the standard new Blogger themes, the canonical address is spelled out in the template.

  
So, as I pointed out at the beginning of the post, if you use third-party themes, always try to check them.

At the moment, while watching the situation. If duplicates no longer appear with GET parameters in Yandex or Google, I will unsubscribe in this post. If there is new information, I will definitely add it.

If you have your own solution to this problem, please leave comments. Your experience and information can help other users. Good luck.

Added on 04.12.

In the Yandex Webmaster panel, I see a picture of how pages with GET parameters are automatically excluded from the index and are not duplicated, since they are not canonical pages. 

Eliminate duplicates due to canonical link

The situation is similar in Google Search Console. Thus, once again I recommend to the owners of sites on the Blogger platform not to use the robots.txt file given at the beginning of the note, but to specify the canonicalURL for the pages. This will keep the blog's normal indexing by the Google mobile robot. If there are clarifications and additions from your experience, please share them in the comments. Your experience will be useful to others. Thank you in advance and success in promoting your blogs to Blogspot.

Original articles with comments: https://pc103help.blogspot.com/2021/10/kak-nastroit-clean-param-dlja-fajla-robots-txt-na-blogger.html

Отправить комментарий

Добавлять новые комментарии запрещено.*

Новые Старые