Google crawl index issue with our website...
-
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server.
In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder.
An example:
Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp
The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp
Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky.
We called our hosting company who said the problem isn't coming from them...
Has anyone had something like this happen to them?
Thanks so much for your insight!
Megan -
Hi Keri. Thanks for following up. This turned out to be an issue with an auto-generated breadcrumbs script. I don't know what the intricacies of that were but we were able to remove it and get this issue straightened out.
Thanks again!
Megan
-
Hi Megan,
I'm following up on older questions that are marked unanswered. Did you ever get this figured out?
-
Megan ,
Please check with your hosting company,
about this code to be included in htaccess
ErrorDocument 404 /404.shtml
/404.shtml its your 404 page
-
Thanks for your help on this Wissam. Is this something that we need to have the hosting company set-up on the server to ensure that these pages get returned as 404s?
-
Megan,
See here
http://markup.io/v/fyd9w4w9wmjr
Googlebot when It crawls this page, you remote server is telling Google Bot that its a Live page and this page Exists
The solution to the upper problem, might help you in fixing the actual problem.
If the Pages with the mystery folder Does not Exist .. your remote server should show google bot a 404 not found (http header).
-
Are we talking about one problem or two?
http://www.burlingtonmortgage.biz/contact.htm does not exist on the remote server (as it was removed over a year ago). I see that there are similar errors for other old pages which were also previously removed. Should we have redirected those to the 404 page since there are not related pages on the existing site?
I am not sure if the two problems have anything to do with one another. The pages with the "mystery folders" are existing pages. They just exist in the root. Why would google be looking at them as if they are inside sub folder?
-
Megan,
noticed something also for example this page http://www.burlingtonmortgage.biz/contact.htm . its showing a 404 error from title and content ... but the HTTP header is showing 200 ok. u need to fix that.
and would assume maybe thats why google started indexing weird URLs generating from your site... and if its true is a 404 page ..google is not picking it up because its showing its a Live page (200ok)
-
We use Dreamweaver.
-
Which CMS are you using?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Issues / Partial Fetch Via Google
We recently launched a new site that doesn't have any ads, but in Webmaster Tools under "Fetch as Google" under the rendering of the page I see: Googlebot couldn't get all resources for this page. Here's a list: URL Type Reason Severity https://static.doubleclick.net/instream/ad_status.js Script Blocked Low robots.txt https://googleads.g.doubleclick.net/pagead/id AJAX Blocked Low robots.txt Not sure where that would be coming from as we don't have any ads running on our site? Also, it's stating the the fetch is a "partial" fetch. Any insight is appreciated.
Technical SEO | | vikasnwu0 -
Google will index us, but Bing won't. Why?
Bing is crawling our site, but not indexing it, and we cannot figure out why -- plus it's being indexed fine in Google. Any ideas on what the issue with Bing might be? Here's are some details to let you know what we've already checked/established: We have 4 301’s and the rest of our site checks out We’ve already established our Robots is ok, and that we are fixing our site map/it's in fine shape We do not see anything blocking bingbot access to the site There is no varnish or any load balancers, so nothing on that end that would be blocking the access We also don't see any rules in the apache or the .htaccess config that would be blocking the access
Technical SEO | | Alex_RevelInteractive0 -
Missing files in Google and Bing Index
We uploaded our sitemap a while back and we are no longer see around 8 out of 33 pages. We try submitting the sitemap again about 1-2 weeks ago and there but no additional pages are seen when I do site: option in both search engines. I reviewed the sitemap and it includes all the pages. I am not seeing any errors in the seo moz for these pages. Any ideas what I should try?
Technical SEO | | EZSchoolApps0 -
No existing pages in Google index
I have a real estate portal. I have a few categories - for example: flats, houses etc. Url of category looks like that: mydomain.com/flats/?page=1 Each category has about 30-40 pages - BUT in Google index I found url like: mydomain.com/flats/?page=1350 Can you explain it? This url contains just headline etc - but no content! (it´s just generated page by PHP) How is it possible, that Google can find and index these pages? (on the web, there are no backlinks on these pages) thanks
Technical SEO | | visibilitysk0 -
Is Google caching date same as crawling/indexing date?
If a site is cached on say 9 oct 2012 doesn't that also mean that Google crawled it on same date ? And indexed it on same date?
Technical SEO | | Personnel_Concept0 -
Odd Google Indexing Issue
I have encountered something odd with Google indexing. According to the Google cache my site was last updated on April 6. I had been making a series of changes on April 7th and none of them show up in the cached version of the site (naturally). Then, on the 8th, my rankings seem to have dropped about 6 places and the main SERP is showing a text that isn't even on the Web site. The cached version has the correct page title from the page that was indexed on the 6th. How do I learn where Google is picking this up from? There is a clean page title tag on my Web site. I've checked the server, etc to see what's going on. The text isn't completely unrelated, but it definitely impacted my ranking. Does Google ever have these hiccups when indexing?
Technical SEO | | VERBInteractive0 -
Database Driven Websites: Crawling and Indexing Issues
Hi all - I'm working on an SEO project, dealing with my first database-driven website that is built on a custom CMS. Almost all of the pages are created by the admin user in the CMS, pulling info from a database. What are the best practices here regarding SEO? I know that overall static is good, and as much static as possible is best, but how does Google treat a site like this? For instance, lets say the user creates a new page in the CMS, and then posts it live. The page is rendered and navigable, after putting together the user-inputed info (the content on the page) and the info pulled from the database (like info pulled out to create the Title tag and H1 tags, etc). Is this page now going to be crawled successfully and indexed as a static page in Google's eyes, and thus ok to start working on rank for, etc? Any help is appreciated - thanks!
Technical SEO | | Bandicoot0 -
Magento - Google Webmaster Crawl Errors
Hi guys, Started my free trial - very impressed - just thought I'd ask a question or two while I can. I've set up the website for http://www.worldofbooks.com (large bookseller in the UK), using Magento. I'm getting a huge amount of not found crawl errors (27,808), I think this is due to URL rewrites, all the errors are in this format (non search friendly): http://www.worldofbooks.com/search_inventory.php?search_text=&category=&tag=Ure&gift_code=&dd_sort_by=price_desc&dd_records_per_page=40&dd_page_number=1 As oppose to this format: http://www.worldofbooks.com/arts-books/history-of-art-design-styles/the-art-book-by-phaidon.html (the re-written URL). This doesn't seem to really be affecting our rankings, we targeted 'cheap books' and 'bargain books' heavily - we're up to 2nd for Cheap Books and 3rd for Bargain Books. So my question is - are these large amount of Crawl errors cause for concern or is it something that will work itself out? And secondly - if it is cause for concern will it be affecting our rankings negatively in any way and what could we do to resolve this issue? Any points in the right direction much appreciated. If you need any more clarification regarding any points I've raised just let me know. Benjamin Edwards
Technical SEO | | Benj250