Massive URL blockage by robots.txt
-
Hello people,
In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this?
-
Even though there are less pages indexed compared to those that are blocked, you still have a significant increase in indexed pages as well. That is a good thing! You technically have more pages that are indexed than before. It looks like you possibly relaunched the site or something? More pages blocked could be an indexing problem, or it might be a good thing - it all depends on what pages are being blocked.
If you relaunched the site and used this great new whiz-bang CMS that created an online catalog that gave your users 54 ways to sort your product catalog, then the number of "pages" could increase with each sort. Just imagine, sort your widgets by color, or by size or by price, or by price and size, or by size and color, or by color and price - you get the idea. Very quickly you have a bunch of duplicate pages of a single page. If your SEO was on his or her toes, they would account for this using a canonical approach or possibly a meta noindex or changing the robots.txt etc. That would be good as you are not going to confuse Google with all the different versions of the same page.
Ultimately, Shailendra has the approach that you need to take. Look in robots.txt, look at the code on your pages. What happened around 5/26/2013? All those things need to be looked at to try and answer your question.
-
Le Fras,
You don't only have to change the robots.txt file for Google to indicate that more URLs are being blocked by it. The robots.txt file tells the search engines not to crawl given URLs, but that they may keep them in the index and display the URLs in the search results.
So the search engines do know of the URLs that are being blocked and they are able to indicate that more are being blocked as you add pages to your site that are restricted by the robots.txt file.
-
Check you robots file. Are there entries to block the crawling? If you can give the url then it would be helpful/
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Parameters
Hi Moz Community, I'm working on a website that has URL parameters. After crawling the site, I've implemented canonical tags to all these URLs to prevent them from getting indexed by Google. However, today I've found out that Google has indexed plenty of URL parameters.. 1-Some of these URLs has canonical tags yet they are still indexed and live. 2- Some can't be discovered through site crawling and they are result in 5xx server error. Is there anything else that I can do (other than adding canonical tags) + how can I discover URL parameters indexed but not visible through site crawling? Thanks in advance!
Intermediate & Advanced SEO | | bbop330 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Switching Url
I started working with a Roofer/Contractor about a year ago. His website is http://www.lancasterparoofing.com/. The name of his business is Spicher Home Improvements. He used to have spicherhomeimprovements.com, well he still does. He was focusing on Roofing and Siding but now would like to branch to other areas like Interior remodeling. So adding interior work under LancasterPaRoofing.com is not applicable. I do not think starting another domain and having two is the best option. I think he should go back to using SpicherHomeImprovements.com and I assume he would take a small hit but in time he should be better off. Plus the url is more applicable to the real name of his business. Thanks for any feedback I receive. Chad
Intermediate & Advanced SEO | | ChadEisenhart0 -
Are these URLs too Keyword-packed?
Hi guys, Here is the URL: http://www.consumerbase.com/mailing-lists/dog-stores-mailing-list.html The target keywords are "Dog stores mailing list" and "Dog stores mailing lists" Does having "mailing-list" and "mailing-lists" in my URL hurt me?
Intermediate & Advanced SEO | | Travis-W0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
What should I block with a robots.txt file?
Hi Mozzers, We're having a hard time getting our site indexed, and I have a feeling my dev team may be blocking too much of our site via our robots.txt file. They say they have disallowed php and smarty files. Is there any harm in allowing these pages? Thanks!
Intermediate & Advanced SEO | | Travis-W1 -
Should I Use City Name in URL?
Having a website designed for a car dealership and deciding what attributes to use in the URL. Should I include the city name in the URL? Or does that help for SEO purposes? Other ideas of what to research or try are appreciated too. Thanks 🙂
Intermediate & Advanced SEO | | kylesuss0 -
Redirecting my new Website URL to my old Website URL
Hi! OK, I am semi - new to SEO Moz but have been self-teaching for 3 years. However I am stuck.. I have been operating my e-commerce site from www.shopadornonline.com for the past 3 years. I just purchased www.shopadorn.com Right now Shopadorn.com re-directs to www.shopadornonline.com because all my products and links go to shopadornonline.com/productblahblahblah I guess I am stuck. Not sure what to tell my web designer to do? Do I give up on having shopadorn.com OR do I start re-directing customers and doing 301 re-directs? I think from what i have read that it is bad to have traffic going to both shopadorn and shopadornonline as they compete for rankings? Where should I start?
Intermediate & Advanced SEO | | Shopadorn0