Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why would a developer build all page content in php?
Picked up a new client. Site is built on Wordpress. Previous developer built nearly all page content in their custom theme's PHP files. In other words, the theme's "page.php" file contains virtually all the HTML for each of the site's pages. Each individual page's back-end page editor appears blank, except for some of the page text. No markup, no widgets, no custom fields. And no dedicated, page-specific php files either. Pages are differentiated within page.php using: elseif (is_page("27") Has anyone ever come across this approach before? Why might someone do this?
Web Design | | mphdavidson0 -
What are the most common reasons for a website being slow to load
I've been advised that too many requests are being sent (presumably to the server?), how can I reduce these and were else should I look to increase speed?
Web Design | | FBS1 -
How do I gain full SEO value from individual property pages?
A client of ours has a vacation rental business with rental locations all over the country. Their old sites were a messy assembly of black hat, broken links and htaccess files that were used over and over on each site. We are redoing everything for them, in one site, with multiple subdirectories for individual locations, like Aspen, Fort Meyers, etc. Anyhow, I'm putting together the SEO plan for the site and I have a problem. The individual rental properties have great SEO value (lots of text, indexable pictures, can create google/bing location pages), and are great for linking in social media (Look at this wonderful property, rental price just reduced!). However, I don't want individual properties, which will have very similar keywords, links, descriptions, etc, competing with each other when indexed. Truth be told, I don't really want search engines linking directly to the individual property pages at all. The intended browsing experience should allow a user to "narrow down" exactly what they're seeking using the site until the perfect rental appears. What I want is for searchers to be directed to the property listing index that most closely matches what they're seeking (Ft. Meyers Rental Condos or Breckenridge Rental Homes), and then allow them to narrow it down from there. This is ideal for the users, because it allows them to see all available properties that match what they want, and ideal for the customer, because it applies dozens of pages of SEO mojo to a single index, rather than dozens of pages. So I can't "noindex" or "nofollow", because I want all that good SEO mojo. I can't REL=CANONICAL, because the property pages aren't similar enough to the index. I can't 301 Redirect because I want the users to be able to see the property pages at some point. I'm stymied.
Web Design | | SpokeHQ0 -
Still too many internal links reported on page
Hi Guys I am new here, and very much learning a lot, and enjoying the benefits of being an SEOMoz user. So here goes with my first question (probably of many). I have known for sometime that our website has a top heavy number of links in the primary navigation. But I wasn't too sure how important this was. Our main objective was to make an east to use nav for customers. All of the feedback we have had says that customers really like our navigation, as it is easy to use etc etc. However, when running an SEOMoz campaign on our site, again we got back that there are too many links on the pages. Example, home page has 500+ links. So I decided to do something about this. I have implemented what I think is a good solution where by the drop down navigation isn't loaded on first load. If the user then hovers over one of our "departments" the sub navigation is loaded via Ajax and dropped in. This means if the user wants it, they get it, if not then it's not loaded with the page. My theory being that Google loads the page without all the links, but a user gets the links as and when they need them. I tested with the SEOMoz toolbar and this tells me that when I load the home page there is 167 links in it vs 500+ previously. However, the my campaign still tells me that my home page has 450+ links (and this is a recent crawl of the page). Our site is here: www.uniquemagazines.co.uk Can you tell me is what I have done is a) a good solution and b) does the SEOMoz crawler have the ability to trigger the hover event and cause the AJAX load of the sub navigation content?
Web Design | | TheUniqueSEO0 -
Google Penalizing Websites that Have Contact Forms at Top of Website Page?
Has anyone else heard of Google penalizing websites for having their contact forms located at the top of the website? For example http://www.austintenantadvisors.com/ Look forward to hearing other thoughts on this.
Web Design | | webestate1 -
How do I identify what is causing my Duplicate Page Content problem?
Hello, I'm trying to put my finger on what exactly is causing my duplicate page content problem... For example, SEOMoz is picking up these four pages as having the same content: http://www.penncare.net/ambulancedivision/braunambulances/express.aspx http://www.penncare.net/ambulancedivision/recentdeliveries/millcreekparamedicservice.aspx http://www.penncare.net/ambulancedivision/recentdeliveries/monongaliaems.aspx http://www.penncare.net/softwaredivision/emschartssoftware/emschartsvideos.aspx As you can tell, they really aren't serving the same content in the body of the page. Anybody have an idea what might be causing these pages to show up as Duplicate Page Content? At first I thought it was the photo gallery module that might be causing it, but that only exists on two of the pages... Thanks in advance!
Web Design | | BGroup0 -
Landing Page/Home Page issues
Hi. I was speaking with my designer last night (we are setting up a new website) and we were discussing the design of our homepage, now the designer said he wanted the first page of the website to be a sort of landing page page were the visitor has to click and enter, im sure everyone has all come across these before. However, I am concerned as to the SEO implications of this? Any help guys?
Web Design | | CompleteOffice0 -
Do iFrames embedded in a page get crawled?
Do iFrames embedded in a page get crawled? I have an iFrame which prints a page hosted by another company embedded in my page. Their links don't include rel=nofollow attributes, so I don't want Google to see them. Do spiders crawl the content in iFrames, or do I have to ensure that the links on this page include the nofollow attribute?
Web Design | | deuce1s0