Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does adding new pages, new slugs, new URLS in a site affects rankings and visibility?
hi reader, i have decided to add new pages to my site. if i add new urls, i feel like i have to submit the sitemap again. my question is, does submitting sitemap again with new slugs or urls affects visibility is serps, if yes, how do i minimize the impact?
Web Design | | SIMON-CULL0 -
Ecommerce Category Pages
First, let's define the terminology for the various types of ecommerce pages. The terminology differs from organization to organization: Product Description Pages (PDPs): These pages have a single product, pricing, an "add to cart" button, reviews, and a product description. Product Listing Pages (PLPs): These are product category/subcategory pages that have product image links and text links to Product Description Pages (PDPs). Category Pages: These pages have subcategory image and text links to subcategory pages. No product images are displayed Hybrid Category Pages: these pages combine sub-Category Images and text at the top of the page and product listings below. Our CMS currently does not allow us to create hybrids. This conversation revolves primarily around mobile. Our ecommerce team is having discussions around the appropriate use of PLPs vs Category pages. After doing a quick audit of the mobile sites of some top ecommerce players, there is definitely a trend to use Category Pages at the top of the category and sub-category hierarchy and use PLPs at the very bottom. The logic from a usability perspective is to allow visitors to navigate a site without ever using the hamburger navigation. ex: Baby (Category Page) => Car Seats (Category Page) => Convertible Car Seats (PLP) The sites I audited all had hamburger menus. A visitor would navigate from a home page image for "Baby," an image on the "Baby" page to "Car Seats", and an image on the "Car Seats" page to the Convertible Car Seats page. At that point, they would be able to shop for "Convertible Car Seats" on a PLP. This appears to be excellent UX and easy to use navigation. Theoretically, good for SEO as well. In short, category and subcategory pages are being used as navigation to allow visitors to easily navigate to the bottom of the hierarchy and shop on the most narrow page in the hierarchy. Much easier to use than a hamburger menu, but it does entail more clicks. The discussion revolves around allowing users to shop for product at a higher level in the taxonomy. For example, what if a visitor wants to shop all Car Seats? In the above taxonomy, we are precluding users from shopping in this manner. There is no "Car Seats" PLP. Our CMS has the ability to create both a Category Page and a PLP for "Car Seats". We could theoretically place an image on the "Car Seats" category page for "View All Car Seats", and allow users to click to a "Car Seats" PLP. None of the major ecommerce players I've audited are adding a PLP option higher up in the hierarchy. That doesn't mean that it's not good UX. Problems: From an SEO perspective, having a Category Page and a PLP for "Car Seats" would cause cannibalization - they would be competing for the same keywords. I am skeptical that canonicals would work. The pages are not near duplicate content. One page has category images, the other has product images. We could place content blocks on the page to make them more similar. We could noindex the PLP, but that's a waste of internal link juice. Need advice: Will canonicals work in this situation? Should we trash this idea entirely? Does adding a PLP add value or confusion? Is noindex a good idea? Is there an option to target keyword variations with the PLP? Is there another solution?
Web Design | | Satans_Apprentice0 -
How do you optimize for online catalog PDFs in regards to Page load time?
Does anyone have any experience with online widgets or apps that can support catalog pdfs? We have tons of catalog PDFs on one page for the website and the more we add, the worse the page load time gets. Any thoughts would be appreciated. Cheers!
Web Design | | FullMedia900 -
Content thin for new home page been told to change it? any suggestions?
Hi guys, I'm newbie.... I have been told that my home page is content thin, and if I want to rank really well in the search i need to have more relevant content on my homepage - the site is only new 2months and I can see we are now at 39th place in the search, if i make changes to the home page design and add more content will this effect this current ranking?
Web Design | | edward-may0 -
Reasons Why Our Website Pages Randomly Loads Without Content
I know this is not a marketing question but this community is very dev savvy so I'm hoping someone can help me. At random times we're finding that our website pages load without the main body content. The header, footer and navigation loads just fine. If you refresh, it's fine but that's not a solution. Happens on Chrome, IE and Firefox, testing with multiple browser versions Happens across various page types - but seems to be only the main content section/container Happens while on the company network, as well as externally Happens after deleting cookies, temporary internet files and restarting computer We are using a CMS that is virtually unheard of - Bridgeline/Iapps Codebase is .net Our IT/Dev group keeps pushing back, blaming it on cookies or Chrome plugins because they apparently are unable to "recreate the problem". This has been going on for months and it's a terrible experience for the user to have. It's also not great when landing PPC visitors on pages that load with no content. If anyone has ideas as to why this may be happening I would really appreciate it. I'm not sure if links are allowed, by today the issue happened on this page serversdirect.com/dm/geek-biz Linking to an image example below knEUzqd
Web Design | | CliqStudios0 -
Funnel tracking with one page check-out?
Hi Guys, I'm creating a new website with a one page checkout that follows the following steps:
Web Design | | Jerune
1. Check availability
2. Select product
2. Select additional product & Add features
3. Provide personal information
4. Order & Pay I'm researching if it is possible to track all these steps (and even steps within the steps) with Google Analytics in order to analyse checkout abandonment. The problem is only that my one-page checkout has only one URL (I want to keep it that way) and therefore can not be differentiated on URL in the Analytics funnel. To continue to the next step also the same button (in a floating cart) in used to advance. The buttons to select/choose something within one step are all different. Do you guys know how I can set this up and how detailed I can make this? For example, is it also possible to test at which field visitors leave when for example filling in their personal information? Would be great if you can help me out!0 -
Solutions for too many links on page (Ecommerce)?
Hello Mozzers, Most Ecommerce websites I've come across have four main link sections - Main Nav - About, Contact etc Side Nav - List of Categories + Products Footer - Useful links etc Promotional Area - Promoting Best sellers / Latest products This ends up totalling anything from 200 to 500 links. I was wondering is there a reasonable solution to hide some of the links? Or should I just ignore the warning? Thanks, Dan
Web Design | | Sparkstone0 -
Are links from main page to inner pages will affect on ranking?
About 3 weeks ago I converted index.html to index.php. Both are 301 redirect to main url. Also I have about 70 links on main page pointing to internal pages. The Website is about 11 years old,and was on active link building . Is this conversion from html to php and also 70 links pointing to inner pages will affect on ranking?Since all links are passing juice to inner pages.
Web Design | | LosAngelesLimo0