Massive URL blockage by robots.txt
-
Hello people,
In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this?
-
Even though there are less pages indexed compared to those that are blocked, you still have a significant increase in indexed pages as well. That is a good thing! You technically have more pages that are indexed than before. It looks like you possibly relaunched the site or something? More pages blocked could be an indexing problem, or it might be a good thing - it all depends on what pages are being blocked.
If you relaunched the site and used this great new whiz-bang CMS that created an online catalog that gave your users 54 ways to sort your product catalog, then the number of "pages" could increase with each sort. Just imagine, sort your widgets by color, or by size or by price, or by price and size, or by size and color, or by color and price - you get the idea. Very quickly you have a bunch of duplicate pages of a single page. If your SEO was on his or her toes, they would account for this using a canonical approach or possibly a meta noindex or changing the robots.txt etc. That would be good as you are not going to confuse Google with all the different versions of the same page.
Ultimately, Shailendra has the approach that you need to take. Look in robots.txt, look at the code on your pages. What happened around 5/26/2013? All those things need to be looked at to try and answer your question.
-
Le Fras,
You don't only have to change the robots.txt file for Google to indicate that more URLs are being blocked by it. The robots.txt file tells the search engines not to crawl given URLs, but that they may keep them in the index and display the URLs in the search results.
So the search engines do know of the URLs that are being blocked and they are able to indicate that more are being blocked as you add pages to your site that are restricted by the robots.txt file.
-
Check you robots file. Are there entries to block the crawling? If you can give the url then it would be helpful/
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index an URL without directly linking it?
Hi everyone, Here's a duplicate content challenge I'm facing: Let's assume that we sell brown, blue, white and black 'Nike Shoes model 2017'. Because of technical reasons, we really need four urls to properly show these variations on our website. We find substantial search volume on 'Nike Shoes model 2017', but none on any of the color variants. Would it be theoretically possible to show page A, B, C and D on the website and: Give each page a canonical to page X, which is the 'default' page that we want to rank in Google (a product page that has a color selector) but is not directly linked from the site Mention page X in the sitemap.xml. (And not A, B, C or D). So the 'clean' urls get indexed and the color variations do not? In other words: Is it possible to rank a page that is only discovered via sitemap and canonicals?
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
How much does URLs with CAPS and URLs with non-CAPS existing on an IIS site matter nowadays?
I work on a couple ecommerce sites that are on IIS. Both sites have return a 200 header status for the CAPS and non CAPS version of the URLs. While I suppose it would be ok if the canonicals pointed to the same version of the page, in some cases it doesn't (ie; /Home-Office canonicalizes to itself and /home-office canonicalizes to itself). I came across this article (http://www.searchdiscovery.com/blog/case-sensitive-urls-and-seo-case-matters/) that is a few years old and I'm wondering how much of an issue it is and how I would determine if it is/isn't?
Intermediate & Advanced SEO | | OfficeFurn0 -
WordPress Duplicate URLs?
On my site, there are two different category bases leading to the exact same page. My developer claims that this is a common — and natural — occurrence when using WordPress, and that there's not a duplicate content issue to worry about. Is this true? Here's an example of the correct url. and... Here's an example of the same exact content, but using a different url. Notice that one is coming from /topics and the other is coming from /authors base. My understanding is that this is bad. Am I wrong?
Intermediate & Advanced SEO | | JasonMOZ1 -
Does having a ? on the end of your URL affect your SEO?
I have some redirects that were done with at "?" at the end of the URL to include google coding (i.e. you click on an adwords link and the google coding follows the redirected link). When there is not coding to follow the link just appears as "filename.html?". Will that affect us negatively SEO-wise? Thank you.
Intermediate & Advanced SEO | | RoxBrock1 -
"Category" word in URLs of blog is it SEO Friendly URL ??
Hello respected community members, I saw many times that "Category" word comes in URL of blog. So my que is that is this negative for SEO or Positive. & if we don't wanna to come CATEGORY in URL how can we remove while URL Optimization ?
Intermediate & Advanced SEO | | sourabhrana390 -
Local language for folders in URLs?
Hi, We're working on a e-commerce project that will be launched in several countries. My question is this: Are there any advantages to name the URL-folders in the local language? Ie. International site: www.domain.com/product/adidas-model-x www.domain.com/category/adidas Norwegian site: www.domain.no/produkt/adidas-model-x www.domain.no/kategori/adidas As i like things tidy, I guess that would also mean we would have to rename the cart URLs and so on. ie. International site: www.domain.com/checkout Norwegian site: www.domain.no/kasse
Intermediate & Advanced SEO | | rtora0 -
Can Anybody Link to my URL to Hurt SEO? Weird URL pointing at my Domaine!
Our ranking has drop since a few weeks. I did not do any major change in my site. Surfing WebMaster Tool, I found lots of new URL linking at our site: url.org linkarena.com seoprofiler.com folkd.com digitalhome.ca bustingprice.com surepurchase.com lowpricetoday.com oyax.com couponfollow.com aspringcleaning.com pamabuy.com etzone.ca How do I find if those was done intentionelly to hurt SEO? Could it be possible? Thank you, BigBlaze
Intermediate & Advanced SEO | | BigBlaze2050 -
Subdomains - duplicate content - robots.txt
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable. Two questions: Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain? If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
Intermediate & Advanced SEO | | EasyStreet0