Google crawl index issue with our website...
-
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server.
In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder.
An example:
Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp
The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp
Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky.
We called our hosting company who said the problem isn't coming from them...
Has anyone had something like this happen to them?
Thanks so much for your insight!
Megan -
Hi Keri. Thanks for following up. This turned out to be an issue with an auto-generated breadcrumbs script. I don't know what the intricacies of that were but we were able to remove it and get this issue straightened out.
Thanks again!
Megan
-
Hi Megan,
I'm following up on older questions that are marked unanswered. Did you ever get this figured out?
-
Megan ,
Please check with your hosting company,
about this code to be included in htaccess
ErrorDocument 404 /404.shtml
/404.shtml its your 404 page
-
Thanks for your help on this Wissam. Is this something that we need to have the hosting company set-up on the server to ensure that these pages get returned as 404s?
-
Megan,
See here
http://markup.io/v/fyd9w4w9wmjr
Googlebot when It crawls this page, you remote server is telling Google Bot that its a Live page and this page Exists
The solution to the upper problem, might help you in fixing the actual problem.
If the Pages with the mystery folder Does not Exist .. your remote server should show google bot a 404 not found (http header).
-
Are we talking about one problem or two?
http://www.burlingtonmortgage.biz/contact.htm does not exist on the remote server (as it was removed over a year ago). I see that there are similar errors for other old pages which were also previously removed. Should we have redirected those to the 404 page since there are not related pages on the existing site?
I am not sure if the two problems have anything to do with one another. The pages with the "mystery folders" are existing pages. They just exist in the root. Why would google be looking at them as if they are inside sub folder?
-
Megan,
noticed something also for example this page http://www.burlingtonmortgage.biz/contact.htm . its showing a 404 error from title and content ... but the HTTP header is showing 200 ok. u need to fix that.
and would assume maybe thats why google started indexing weird URLs generating from your site... and if its true is a 404 page ..google is not picking it up because its showing its a Live page (200ok)
-
We use Dreamweaver.
-
Which CMS are you using?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Recover google INdexing issue after fixing malware attack.
Dear My Niche site attacked by malware on 1 st march 2018. Hacker inject a php file on my blogpage. Injected link like: mydomain.com/blog/dmy4xa.php? Then I scan My site by wordfence. Identifying all malware code.Then manually clean whole site with database. My site is completely free from malware. and remove all malware link from webmaster tools. Even Block my blog page by robots.txt . But new malware link index every week. So i need to remove those link every week. So this issue I decided to rebuild my site. Finally I rebuild my site another server. Then I flash my current server and migrate my site from those server on 10th january 2019 . I wait 1 month to deindex malware link. But new link are indexing every week. I discourage site for over 1 week and even delete site from google webmaster tools with all properties as well as verification file from server. Over 1 week , Link are showing. I feel boar to delete malware link every week. I need permanent solution. Please give me a perfect solution for this malware link index. Google index about 100 url .After that I clean my site with some tools. My site was free from malware. But Ne
Technical SEO | | Gfound1230 -
I am Using Wix website creator. Will google be able to read the javascript?
I tried using some of the moz tools like the "on page grader" and it was not able to read any of the writing on my webpage because wix uses javascript. Will this impact my rankings on google compared to my competitors? The New Wix websites allow you to build a website in HTML. Should I switch to this? Thanks, Jonathan
Technical SEO | | H1_Marketing_Solutions0 -
Why seomoz.org still in Google index?
I searched in Google, the number of URLs indexed left in the seomoz.org domain since it changed to moz.comI am surprised that after all this time more than 15,000 URLs indexed:https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Aseomoz.org%20inurl%3Aseomoz.org If I clicked on any of the results it will be redirect (301) to the new domain, so it is working, but Google still keep these URLs in the index.
Technical SEO | | Yosef
What could be the reason?Will not cause duplicated content issue on moz.com?0 -
Google has deindexed 40% of my site because it's having problems crawling it
Hi Last week i got my fifth email saying 'Google can't access your site'. The first one i got in early November. Since then my site has gone from almost 80k pages indexed to less than 45k pages and the number is lowering even though we post daily about 100 new articles (it's a online newspaper). The site i'm talking about is http://www.gazetaexpress.com/ We have to deal with DDoS attacks most of the time, so our server guy has implemented a firewall to protect the site from these attacks. We suspect that it's the firewall that is blocking google bots to crawl and index our site. But then things get more interesting, some parts of the site are being crawled regularly and some others not at all. If the firewall was to stop google bots from crawling the site, why some parts of the site are being crawled with no problems and others aren't? In the screenshot attached to this post you will see how Google Webmasters is reporting these errors. In this link, it says that if 'Error' status happens again you should contact Google Webmaster support because something is preventing Google to fetch the site. I used the Feedback form in Google Webmasters to report this error about two months ago but haven't heard from them. Did i use the wrong form to contact them, if yes how can i reach them and tell about my problem? If you need more details feel free to ask. I will appreciate any help. Thank you in advance C43svbv.png?1
Technical SEO | | Bajram.Kurtishaj1 -
Dev Site Was Indexed By Google
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out? From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything. Any advice is welcome, I just wanted to discuss this before making a decision.
Technical SEO | | ntsupply0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
How long will Google take to stop crawling an old URL once it has been 301 redirected
I need to do a clean-up old urls that have been redirected in sitemap and was wondering about this.
Technical SEO | | Ant-8080 -
Struggling to get my lyrics website fully indexed
Hey guys, been a longtime SEOmoz user, only just getting heavily into SEO now and this is my first query, apologies if it's simple to answer but I have been doing my research! My website is http://www.lyricstatus.com - basically it's a lyrics website. Rightly or wrongly, I'm using Google Custom Search Engine on my website for search, as well as jQuery auto-suggest - please ignore the latter for now. My problem is that when I launched the site I had a complex AJAX Browse page, so Google couldn't see static links to all my pages, thus it only indexed certain pages that did have static links. This led to my searches on my site using the Google CSE being useless as very few pages were indexed. I've since dropped the complex AJAX links and replaced it with easy static links. However, this was a few weeks ago now and still Google won't fully index my site. Try doing a search for "Justin Timberlake" (don't use the auto-suggest, just click the "Search" button) and it's clear that the site still hasn't been fully indexed! I'm really not too sure what else to do, other than wait and hope, which doesn't seem like a very proactive thing to do! My only other suspicion is that Google sees my site as more duplicate content, but surely it must be ok with indexing multiple lyrics sites since there are plenty of different ones ranking in Google. Any help or advice greatly appreciated guys!
Technical SEO | | SEOed0