Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Stop Google from Indexing Old Pages
-
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact).
These pages no longer exist on the site and there are no internal or external links pointing to these pages.
Google has crawled the site since the go live, but continues to try and crawl these pages.
What are my next steps?
-
All my clients are impatient with Google's crawl. I think the speed of life on the web has spoiled them. Assuming your site isn't a huge e-commerce or subject-matter site...you will get crawled but not right away. Smaller, newer sites take time.
Take any concern and put it towards link building to the new site so Google's crawlers find it faster (via their seed list). Get it up on DMOZ, get that Twitter account going, post videos to Youtube, etc. Get some juicy high-PR inbound links and that could help speed up the indexing. Good luck!
-
Like Mike said above, there still isn't enough info provided for us to give you a very clear response, but I think he is right to point out that you shouldnt really care about the extinct pages in Google's index. They should, at some point, expire.
You can specify particular URLs to remove in GWT, or your robots.txt file, but that doesn't seem the best option for you. My recommendation is to just prepare the new site in the new location, upload a good clean sitemap.xml to GWT, and let them adjust. If you have much of the same content as well, Google will know due to the page creation date which is the newer and more appropriate site. Hate to say "trust the engines" but in this case, you should.
You may also consider a rel="author" tag in your new site to help Google prioritize the new site. But really the best thing is a new site on a new domain, a nice sitemap.xml, and patience.
-
To further clear things up...
I can 301 every page from the old .php site to our new homepage (However, I'm concerned about Google's impression of our overall user experience).
Or
I can 410 every page from the old .php site (Wouldn't this tell Google to stop trying to crawl these pages? Although these pages technically still exist, they just have a different URL and directory structure. Too many to set up individual 301's tho).
Or
I can do nothing and wait for these pages to drop off of Google's radar
What is the best option?
-
After reading the further responses here I'm wondering something...
You switched to a new site, can't 301 the old pages, and have no control over the old domain... So why are you worried about pages 404ing on an unused site you don't control anymore?
Maybe I'm missing something here or not reading it right. Who does control the old domain then? Is the old domain just completely gone? Because if so, why would it matter that Google is crawling non-existent pages on a dead site and returning 404s and 500s? Why would that necessarily affect the new site?
Or is it the same site but you switched to Java from PHP? If so, wouldn't your CMS have a way of redirecting the old pages that are technically still part of your site to the newer relevant pages on the site?
I feel like I'm missing pertinent info that might make this easier to digest and offer up help.
-
Sean,
Many thanks for your response. We have submitted a new, fresh site map to Google, but it seems like it's taking them forever to digest the changes.
We've been keeping track of rankings, and they've been going down, but there are so many changes going on at once with the new site, it's hard to tell what is the primary factor for the decline.
Is there a way to send Google all of the pages that don't exist and tell them to stop looking for them?
Thanks again for your help!
-
You would need access to the domain to set up the 301. If you no longer can edit files on the old domain, then your best bet is to update Webmaster Tools with the new site info and a sitemap.xml and wait for their caches to expire and update.
Somebody can correct me on this if I'm wrong, but getting so many 404s and 500's already has probably impacted your rankings so significantly, that you may be best served to approach the whole effort as a new site. Again, without more data, I'm left making educated guesses here. And if you aren't tracking your rankings (as you asked how much it is impacting...you should be able to see), then I would let go of the old site completely and build search traffic fresh on the new domain. You'd probably generate better results in the long term by jettisoning a defunct site with so many errors.
I confess, without being able to dig into the site analytics and traffic data, I can't give direct tactical advice. However, the above is what I would certainly do. Resubmitting a fresh sitemap.xml to GWT and deleting all the info to the old site in there is probably your best option. I defer to anyone with better advice. What a tough position you are in!
-
Thanks all for the feedback.
We no longer have access to the old domain. How do we institute a 301 if we can no longer access the page?
We have over 200,000 pages throwing 404's and over 70,000 pages throwing 500 errors.
This probably doesn't look good to Google. How much is this impacting our rankings?
-
Like others have said, a 301 redirect and updating Webmaster Tools should be most of what you need to do. You didn't say if you still have access to the old domain (where the pages are still being crawled) or if you get a 404, 503, or some other error when navigating to those pages. What are you seeing or can you provide a sample URL? That may help eliminate some possibilities.
-
You should implement 301 redirects from your old pages to their new locations. It's sounds like you have a fairly large site, which means Google has tons of your old pages in its index that it is going to continue to crawl for some time. It's probably not going to impact you negatively, but if you want to get rid of the errors sooner I would throw in some 301s. \
With the 301s you'll also get any link value that the old pages may be getting from external links (I know you said there are none, but with 200K+ pages it's likely that at least one of the pages is being linked to from somewhere).
-
Have you submitted a new sitemap to Webmaster Tools? Also, you could consider 301 redirecting the pages to relevant new pages to capitalize on any link equity or ranking power they may have had before. Otherwise Google should eventually stop crawling them because they are 404. I've had a touch of success getting them to stop crawling quicker (or at least it seems quicker) by changing some 404s to 410s.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Page disappears from Google search results
Hi, I recently encountered a very strange problem.
Technical SEO | | JoelssonMedia
One of the pages I published in my website ranked very well for a couple of days on top 5, then after a couple of days, the page completely vanished, no matter how direct I search for it, does not appear on the results, I check GSC, everything seems to be normal, but when checking Google analytics, I find it strange that there is no data on the page since it disappeared and it also does not show up on the 'active pages' section no matter how many different computers i keep it open. I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Has this ´happened to anyone before? Any thoughts would be gratefully received.0 -
Google tries to index non existing language URLs. Why?
Hi, I am working for a SAAS client. He uses two different language versions by using two different subdomains.
Technical SEO | | TheHecksler
de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly. But Google Search Console tries to index URLs which were never existing before and are still not existing. de.domain.com**/en/company
en.domain.com/de/**company ... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that 😉 ). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
We do see in our logfiles, that a Screaming Frog Installation and moz.com w opensiteexplorer were trying to access this earlier. My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed? Any ideas? Thanks 🙂0 -
Removed Subdomain Sites Still in Google Index
Hey guys, I've got kind of a strange situation going on and I can't seem to find it addressed anywhere. I have a site that at one point had several development sites set up at subdomains. Those sites have since launched on their own domains, but the subdomain sites are still showing up in the Google index. However, if you look at the cached version of pages on these non-existent subdomains, it lists the NEW url, not the dev one in the little blurb that says "This is Google's cached version of www.correcturl.com." Clearly Google recognizes that the content resides at the new location, so how come the old pages are still in the index? Attempting to visit one of them gives a "Server Not Found" error, so they are definitely gone. This is happening to a couple of sites, one that was launched over a year ago so it doesn't appear to be a "wait and see" solution. Any suggestions would be a huge help. Thanks!!
Technical SEO | | SarahLK0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0