Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Console Errors 400 and 405
Hi, Does anyone know if search console errors showing as follows are damaging to serps: /xmlrpc.php is returning 405 error /wp-admin/admin-ajax.php is returning 400 error These errors seem to of coincided almost to the day that there was a ranking drop for the primary keyword from mid page 1 to bottom of page 2. No matter what I do I cannot seem to correct these errors. Any advice would be greatly appreciated. Thanks
Technical SEO | | DaleZon0 -
Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console
Hi! The Problem We have submitted to GSC a sitemap index. Within that index there are 4 XML Sitemaps. Including one for the desktop site and one for the mobile site. The desktop sitemap has 3300 URLs, of which Google has indexed (according to GSC) 3,000 (approx). The mobile sitemap has 1,000 URLs of which Google has indexed 74 of them. The pages are crawlable, the site structure is logical. And performing a Landing Page URL search (showing only Google/Organic source/medium) on Google Analytics I can see that hundreds of those mobile URLs are being landed on. A search on mobile for a longtail keyword from a (randomly selected) page shows a result in the SERPs for the mobile page that judging by GSC has not been indexed. Could this be because we have recently added rel=alternate tags on our desktop pages (and of course corresponding canonical ones on mobile). Would Google then 'not index' rel=alternate page versions? Thanks for any input on this one. PmHmG
Technical SEO | | AlisonMills0 -
Why has my search traffic suddenly tanked?
On 6 June, Google search traffic to my Wordpress travel blog http://www.travelnasia.com tanked completely. There are no warnings or indicators in Webmaster Tools that suggest why this happened. Traffic from search has remained at zero since 6 June and shows no sign of recovering. Two things happened on or around 6 June. (1) I dropped my premium theme which was proving to be not mobile friendly and replaced it with the ColorMag theme which is responsive. (2) I relocated off my previous hosting service which was showing long server lag times to a faster host. Both of these should have improved my search performance, not tanked it. There were some problems with the relocation to the new web host which resulted in a lot of "out of memory" errors on the website for 3-4 days. The allowed memory was simply not enough for the complexity of the site and the volume of traffic. After a few days of trying to resolve these problems, I moved the site to another web host which allows more PHP memory and the site now appears reliably accessible for both desktop and mobile. But my search traffic has not recovered. I am wondering if in all of this I've done something that Google considers to be a cardinal sin and I can't see it. The clues I'm seeing include: Moz Pro was unable to crawl my site last Friday. It seems like every URL it tried to crawl was of the form http://www.travelnasia.com/wp-login.php?action=jetpack-sso&redirect_to=http://www.travelnasia.com/blog/bangkok-skytrain-bts-mrt-lines which resulted in a 500 status error. I don't know why this happened but I have disabled the Jetpack login function completely, just in case it's the problem. GWT tells me that some of my resource files are not accessible by GoogleBot due to my robots.txt file denying access to /wp-content/plugins/. I have removed this restriction after reading the latest advice from Yoast but I still can't get GWT to fetch and render my posts without some resource errors. On 6 June I see in Structured Data of GWT that "items" went from 319 to 1478 and "items with errors" went from 5 to 214. There seems to be a problem with both hatom and hcard microformats but when I look at the source code they seem to be OK. What I can see in GWT is that each hcard has a node called "n [n]" which is empty and Google is generating a warning about this. I see that this is because the author vcard URL class now says "url fn n" but I don't see why it says this or how to fix it. I also don't see that this would cause my search traffic to tank completely. I wonder if anyone can see something I'm missing on the site. Why would Google completely deny search traffic to my site all of a sudden without notifying any kind of penalty? Note that I have NOT changed the content of the site in any significant way. And even if I did, it's unlikely to result in a complete denial of traffic without some kind of warning.
Technical SEO | | Gavin.Atkinson1 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
How to handle (internal) search result pages?
Hi Mozers, I'm not quite sure what the best way is to handle internal search pages. In this case it's for an ecommerce website with about 8.000+ products and search pages currently look like: example.com/search.php?search=QUERY+HERE. I'm leaning towards making them follow, noindex. Since pages like this can be easily abused for duplicate content and because I'd rather have the category pages ranked. How would you handle this?
Technical SEO | | Qon0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Site disappearing from search for a certain keyword
I was wondering if someone has encountered the same problem as me. I was doing some changes on the frontpage of one of my clients' website, especially some redirections, and my site has disappeared from Google for the main keyword on the page. So, if I look for my page on Google, instead of seeing my page first, I no longer see my page, at all. All I've done was a 301 redirection from index.html to the domain name. Now, I changed everything back to how it was before. More precisely, I've done that 2 weeks ago. But, no change in Google. I checked Bing and Yahoo, my site appears first when I search for that specific keyword. Any ideas how long will it take for Google to see that I am not doing anything wrong with redirections? Or any idea at all?
Technical SEO | | webmasterles0