Struggling with Google Bot Blocks - Please help!
-
I own a site called www.wheretobuybeauty.com.au
After months and months we still have a serious issue with all pages having blocked URLs according to Google Webmaster Tools.
The 404 errors are returning a 200 header code according to the email below. Do you agree that the 404.php code should be changed? Can you do that please ?
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
The no of pages that are displayed in a google search request of www.google.com.au where the query was ‘site:wheretobuybeauty.com.au’ is 37,000:
This new site has been re-crawled over last 4 weeks.
About the site
This is a Linux php site and has the following:
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none
Then we created one but were advised that it needed to have this current content:
User-agent: *
Disallow:
-
No problem my friend. You are most welcome and here at Moz, you will not only be able to get almost all your SEO related queries addressed and solved, you will also learn a great deal about digital marketing. I highly recommend to every aspiring digital marketer to be active on a community like Moz and I bet they will be able to save a great deal of time and money as well. Wish you all the very best.
Regards,
Devanur Rafi.
-
Thanks Devanur - trying out everything you have suggested.
-
Hi Alex,
Sorry, if I were not clear in my previous post. I meant that in general pages with cleaner code will have an edge over similar pages with bad code when it comes to SEO.
Just an example: Page A has cleaner code compared to page B with all other SEO factors being equal. In a scenario like this, page B might not be favored by Google because of issues arising from bad code like page loading performance, poor rendering in browsers etc,.
The issue at hand might not be because your pages do not pass W3 Validation but its not a bad idea to have a cleaner code on your website
Best regards,
Devanur Rafi.
-
Hi Devanur
My understanding is that Google does not have a problem with invalid XHTML or pages that are not W3C accessible. Please see a comment on this at SEOMOZ:
-
Hi Alex,
I did a code validation check for the following URL:
It gave 238 Errors and 538 Warnings!!
Search engines like Google favor pages with cleaner code. So, I strongly recommend to have the code cleaned on the website.
Here you go for validation check:
Best regards,
Devanur Rafi.
-
Hi Alex,
If the underscores constitute only 4% of the total URLs, then this can be safely kept aside in purview of the current issue.
Same goes with the keyword repetition in the page titles and URLs. However, if it is possible for you to revisit your URL structure and have it like the following, you should go for it:
www.wheretobuybeauty.com.au/<brand< a=""> name>/<product name="">, e.g.</product></brand<>
http://www.wheretobuybeauty.com.au/floris/royal-arms-diamond-edition-eau-de-parfum-spray-100ml-34oz
Same thing with the Page titles also.
Now we are left with two things, the page performance and URL canonicalization. Please have them fixed as early as possible.
Also, I checked your IP address and you have gone for a shared hosting. This is not at all recommended if you are a serious online business owner. Your IP, 103.9.170.75 is being shared by at least 250 other domains that include some bad websites.
Though there are different views about IP bad neighborhood and its impact on SEO, I have always been an advocate of clean IP and recommended it to all my clients always. You can go in for a dedicated IP which is very cheap these days and better yet if you go for a VPS.
For more about this, please check out the "Oops, your IP is either dirty or virtual" section on the following page:
http://www.bruceclay.com/in/seo-tech-tips/techtips.htm
And also, this section, "A Strong Foundation for Your Site to Operate On" on the following page:
http://www.bruceclay.com/blog/2011/04/the-seo-bucket-list-3-things-to-do-before-your-site-dies/
Lastly, I checked your domain's DNS health and here you go for the results:
http://intodns.com/wheretobuybeauty.com.au
Though these might not be causing the current issue, its good to sort everything as we should not leave any stone unturned in making our website a better one out there.
Best regards,
Devanur Rafi.
-
Hey Devanur
please see our responses below:
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
**Response: We are asking the developer to correct this. **
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
Response: This problem only occurs in a few pages where special characters have been replaced with underscores – probably in 4% of product pages. I can’t see that this has an impact on the SEO?
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in Google PageSpeed check.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that will not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in gtmetrix.com suggestions.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
Response: This occurs with approx 300 product pages out of 40,000 so a very small percentage. We will clean this up when we update our data. I can’t see that this has any impact on SEO considering the small no? Note however that every product page is constructed as follows:
http://www.wheretobuybeauty.com.au/floris/floris-royal-arms-diamond-edition-eau-de-parfum-spray-100ml_34oz
Is there some risk that this will look like over optimisation?
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
It could have been like: www.wheretobuybeauty.com.au/acqua-di-parma/collezione-barbiere-shaving-cream-75ml-25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Thanks Devanur
I put this to my partners and he said he is addressing it but that the main issue still remains.
This is the critical issue where there are only a few pages visible to google search as almost all are blocked by the google bot. I am re-stating the problem in this email for you.
Can you please take a look at the whole problem and see if you can see what is causing this.
Is robots.txt causing this? It is the only change that we have made and at one point the problem was corrected but has now returned. I have read everything that I can about robots.txt on the google site and in forums.
Here are two examples (out of 44,000) that are blocked. It is easy to find other examples – simply test any of the product pages – only 200 out of 44,000 return any result.
Try searching using www.google.com.au and using the search query
Abercrombie & Fitch 1892 Cobalt Eau De Cologne Spray 50ml/1.7oz site:wheretobuybeauty.com.au
Second example:
Try searching using:
Acqua Di Parma Collezione Barbiere Shaving Cream 75ml/2.5oz site:wheretobuybeauty.com.au
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert Harmeen and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
This new site has been re-crawled over last 4 weeks.
About the site
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none, then we created one but were advised that it needed to have this current content:
“User-agent: *
Disallow:
Sitemap: http://www.wheretobuybeauty.com.au/sitemap.xml”
I put this into robots.txt but was then advised yesterday that there should be no blank line between these lines and I removed them yesterday.
-
Hi Alex,
Without diving in to the issue of increased number of 404 errors being reported by Webmaster tools account, let us first look at the core issue where, 404 pages (non-existing resources) that return an HTTP header status code 200. These are called, 'soft 404 errors'. Ideally all the non-existing resources on the website should return an HTTP header status code 404 or 410 as per the situation and not a status 200 which is very confusing for search engines and a bad practice. This should be fixed immediately. Please have all such pages return 404 and not 200 as soon as possible.
Here you go for more about the soft 404 errors:
https://support.google.com/webmasters/answer/181708?hl=en
and here to know more about when to return a 404 status code:
https://support.google.com/webmasters/answer/2409439?hl=en
Best regards,
Devanur Rafi.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Direct Answers Box - available in other languages aside from English?
Hi guys, Last year I wrote a post for the YouMoz blog (https://moz.com/ugc/google-s-direct-answers-how-to-keep-visitors-coming-to-your-site) about Google's Direct Answers box. Needless to say, I focused purely on English language queries but I'm curious to know if anyone has seen the box appear for queries in other languages. If you've seen this happening and could provide some examples, then please let me know! Thanks in advance! Daniel
Algorithm Updates | | Daniel_Morgan0 -
Can Google penalize a country keyword
Hello again guys Thank you for your previous help with www.kids-academy.co.uk - we are slowly getting there! I wanted to ask something I cannot seem to find an answer to, can Google penalize you by country? By this I mean; Search term
Algorithm Updates | | LeanneSEO
Nursery franchise UAE Page 1
Nursery franchise UK Nowhere to be found! The page in question (well a section of the site) has been optimised for UK, however, as they do have a sister site in the UAE, it mentions those areas too. The pages I have been working on are now ranking reasonably well to say there is a long way to go, but for long tailed keywords NOT including anything to do with the UK. There are no naughty backlinks with the anchor text to do with the UK, the server is hosted in the UK, it is a .co.uk URL (no geotagging but I would like to know if this is of any use with this type of URL, everything says no, but it cant harm can it?) - is it possible Google due to bad practices in the past have slapped a penalty on the specific keyword area? Not something I have come across previously but I am scratching my head over here! Time for a brew break 😄 Thanks in advance guys! Leanne1 -
Panda, Negative SEO and now Penguin - help needed
Hi,
Algorithm Updates | | mlm12
We are small business owners who've been running a website for 5 years that provides our income. We've done very little backlinking ourselves, and never did paid directories or anything like that - usually just occasional forum or blog responses. A few articles here and there with some of our keyword phrases for internal pages. Of course I admit we've done some kwp backlinks on some blogs, but our anchor text profile is largely brand names and our domain name and non keywords (excepting for some "bad" backlinks). Our DA is 34, PA 45 for our home page. We were doing great until last Sept 27 when we got hit by Panda and have been working on deoptimizing our site for keywords, we made a new site in Wordpress for good architecture and ease of use for our customers, and we're deleting/repurposing low quality pages and making our content more robust. We haven't yet recovered from this and now it appears we got hit May 22 for Penguin...ARGH! I recently discovered (hard to have time to devote to everything with just two of us) that others can "negative seo" a site now and I feel this has happened based upon results below... I signed up for linkdetox.com yesterday and it gives a grim picture of our backlinks (says we are in "deadly risk" territory). We have 83 "toxic" links and 600 some "suspicious" links (many are in malware/malicious listed sites, many are .pl domains from Poland, others are I believe foreign domains, or domains that are a bunch or letters that make no sense, or spammy sounding emd domains), - this makes up 80% of our links. As this is our only business, our income is now 1/3 of what it has been, even with PPC ads going as we've been hit hard by all of this and are wondering if we can survive fixing this. We do have an SEO firm minimally helping us along with guidance on recovering, but with income so low, we are doing the work ourselves and can't afford much. Needless to say, we are quite distressed and from reading around, not sure if we'll be able to recover and that is deeply saddening, especially from Negative SEO. We want to make sure we are on the right path for recovery if possible, hence my questions. We haven't been in contact with Google for reconsideration, again, no penalty messages from them. First of all, if we don't have a manual penalty, would you still contact all the toxic/malicious/possible porn looking sites and ask for a link removal, wait, ask for link removal, wait then disavow? Or just go straight to Google disavow? For backlinks coming from sites that are "gone" (like a message saying the account has been suspended), or there is no website there anymore, do I try and contact them too? Or go direct to disavow? Or do nothing? For the sites flagged as malicious (by linkdetox, my browser, or by Google), I don't want to try and open them on my browser to see if this site is legitimate. If linkdetox doesn't have the contact info for these - what are we supposed to do? For "suspicious" foreign sites that I can't read the webpage -would you still disavow them (I've seen many here say links from foreign sites should be disavowed). How do you keep up with all this is someone is negative SEOing you? We're really frustrated that Google's change has made it possible for competitors to tank your business (arguably though, if we had a stronger backlink profile this may not have hurt, or not as much - not sure). When you are small biz owners and can't hire a group to constantly monitor backlinks, get quality backlinks, content, site optimization, etc - it seems an almost impossible task to do. Are wordpress left nav and footer link anchor text an issue for Penguin? I would think Google would realize these internal links will be repetitive for the same anchor text on Wordpress (I know Matt Cutts said to not use the same anchor text more than once for internal linking -but obviously nav and footer menus will do this). What would you do if this was you? Try and fix it all? Start over with a new domain and 301 it (some say this has been working)? Just start over with a new domain and don't redirect? Thanks for your input and advice. We appreciate it.0 -
Staff Dumbfounded by Rankings Drop - Please Help Us Understand!
We are completely dumbfounded by the amount of organic traffic we lost virtually instantly back on Aug 22 2012. We are spending much more money advertising as of late but took another massive plunge since rolling out our newer site redesign this past sat 04/13/13. The newer and more updated version of our site seems to all of a sudden have us dropping like a rock again! Our developers and in house SEO guy that is in house here seems to think our content is ok and that our PR and page authority is ok as well. However they have told me it isn't good per-say, but not the reason in their opinion for our sites drop in rankings instantly. We've seen tons of keywords drop 22-40+ positions in google. We've been online since 2001 and I've never seen anything even remotely close to this. Didn't seem to see such an impact with bing or yahoo though. Due to our rankings being slaughtered we decided to hired WebIMAX to come in and figure out what happened. They informed us that we must have been hit with the panda filter they collectively guessed. Said our content was fair lol. They done allot of tests without anything really indicated the real root cause of the problem and most every major change they requested we made. However we've changed the site design and layout now and changed much of the content and overall structure to be better we believe and we for the life of us cannot understand the massive unexplained penalty. I attached an image which illustrates our dramatic drop in traffic. Bare in mind that as traffic drops we spend more $$$ advertising so mere traffic numbers don't even really say it all. Our organic results are really down maybe 60-70%. We really thought WebIMAX would be a big help and give great assistance and insight. I didn't see any of that and I think our IT staff agrees. We paid big bucks for nothing in return it seems. However we are desperate and are actually considering staying with them even though they've produced zero results or maybe negative results. In fact with all that was done over weeks and weeks with WebIMAX we continued to DROP in organic results. We don't know if we should go back to them, choose another SEO company or just go on trying to fix this issue ourselves. Website is http://www.cruizinconceptswholesale.com/ We just want someone that knows what they are looking at to say hey I see something Major Right Here. If we could get that then we would simply fix it. Thanks in advance for anyone willing to help us out with their expert knowledge and I think I would trust the community here more than WebIMAX easily! David. cruizin-traffic-image.jpg
Algorithm Updates | | David_C0 -
How can I tell Google two sites are non-competing?
We have two sites, both English language. One is a .ca and the other is a .com, I am worried that they are hurting one another in the search results. I'd like to obviously direct google.ca towards the .ca domain and .com towards the .com domain and let Google know they are connected sites, non-competing.
Algorithm Updates | | absoauto0 -
Why is Google changing my title tags?
I have a few sites set up this way with their title tags: "Keyword rich phrase(s) | Company name" and Google is showing more and more of them like this in the SERPs - "Company name: Keyword rich phrase(s)" I don't see this happening to many other sites...am I hallucinating or what's going on here? Is this happening to anyone else? I don't see it necessarily affecting rankings, but for my sites with little brand recognition I want those keywords first. Bueller? Bueller?
Algorithm Updates | | NetvantageMarketing0 -
Google Rankings Jumping Around
Hi, Since January, the Google rankings for one of our sites has been jumping around. Sometimes it's on page 1, then it disappears and comes back around 1 month later. It's strange because it's only a small section of the site that it's happening to. Every other section of the site is doing really well. Just wondered if anyone else is having this problem, or has had it and can suggest any fixes. There are no technical issues, no changes have been made to the site, all I can think is it's Google messing around with their algorithm? Any help or advice would be much appreciated. Karen
Algorithm Updates | | Digirank0 -
Google's reaction to site updates
Hi, Is it safe to assume as soon as Google indexes updates I've made to my site that any ranking changes the updates effected will happen at that same time, or is there ever a lag time before these changes ( if any ) take effect?
Algorithm Updates | | minutiae0