How can I tell Google, that a page has not changed?
-
Hello,
we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot.
We would like to tell googlebot, to stop crawling pages that never change. This one for instance:
http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html
As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back.
The following header fields might be relevant. Currently our webserver answers with the following headers:
Cache-Control:
no-cache, must-revalidate, post-check=0, pre-check=0, public
Pragma:no-cache
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future?
I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us.
Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages.
Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages?
Thanks for your help
Cord
-
Unfortunately, I don't think there are many reliable options, in the sense that Google will always honor them. I don't think they gauge crawl frequency by the "expires" field - or, at least, it carries very little weight. As John and Rob mentioned, you can set the "changefreq" in the XML sitemap, but again, that's just a hint to Google. They seem to frequently ignore it.
If it's really critical, a 304 probably is a stronger signal, but I suspect even that's hit or miss. I've never seen a site implement it on a large scale (100s or 1000s of pages), so I can't speak to that.
Two broader questions/comments:
(1) If you currently list all of these pages in your XML sitemap, consider taking them out. The XML sitemap doesn't have to contain every page on your site, and in many cases, I think it shouldn't. If you list these pages, you're basically telling Google to re-crawl them (regardless of the changefreq setting).
(2) You may have overly complex crawl paths. In other words, it may not be the quantity of pages that's at issue, but how Google accesses those pages. They could be getting stuck in a loop, etc. It's going to take some research on a large site, but it'd be worth running a desktop crawler like Xenu or Screaming Frog. This could represent a site architecture problem (from an SEO standpoint).
(3) Should all of these pages even be indexed at all, especially as time passes? More and more (especially post-Panda), more indexed pages is often worse. If Googlebot is really hitting you that hard, it might be time to canonicalize some older content or 301-redirect it to newer, more relevant content. If it's not active at all, you could even NOINDEX or 404 it.
-
Thanks for the answers so far. The tips are not really solving my problems yet, though: I don't want to set down general crawling speed in the webmaster tools, because pages that frequently change should also be crawled frequently. We do have XML Sitemaps, although we did not include these picture pages, as in our example. There are ten- maybe houndreds- of thousands of these pages. If everyone agrees on this, we can include these pages in our XML Sitemaps of course. Using "meta refresh" to indicate, that the page never changed, seems a bit odd to me. But I'll look into it.
But what about the http headers, I asked about? Does anyone have any ideas on that?
-
Your best bet is to build an Excel report using a crawl tool (like Xenu, Frog, Moz, etc), and export that data. Then look to map out the pages you want to log and mark as 'not changing'.
Make sure to built (or have a functioning XML sitemap file) for the site, and as John said, state which URL's NEVER change. Over time, this will tell googlebot that it isn't neccessary yo crawl those page URL's as they never change.
You could also place a META REFRESH tag on those individual pages, and set that to never as well.
Hope some of this helps! Cheers
-
If you have Google Webmaster Tools set up, go to Site configuration > Settings, and you can set a custom crawl rate for you site. That will change it site-wide, so if you have other pages that change frequently, that might not be so great for you.
Another thing you could try is generate a sitemap, and set a change frequency of never (or yearly) for all of the pages you don't expect to change. That also might slow down Google's crawl rate of those pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Home Pages of Several Websites are disappearing / reappearing in Google Index
Hi, I periodically use the Google site command to confirm that our client's websites are fully indexed. Over the past few months I have noticed a very strange phenomenon which is happening for a small subset of our client's websites... basically the home page keeps disappearing and reappearing in the Google index every few days. This is isolated to a few of our client's websites and I have also noticed that it is happening for some of our client's competitor's websites (over which we have absolutely no control). In the past I have been led to believe that the absence of the home page in the index could imply a penalty of some sort. This does not seem to be the case since these sites continue to rank the same in various Google searches regardless of whether or not the home page is listed in the index. Below are some examples of sites of our clients where the home page is currently not indexed - although they may be indexed by the time you read this and try it yourself. Note that most of our clients are in Canada. My questions are: 1. has anyone else experienced/noticed this? 2. any thoughts on whether this could imply some sort of penalty? or could it just be a bug in Google? 3. does Google offer a way to report stuff like this? Note that we have been building websites for over 10 years so we have long been aware of issues like www vs. non-www, canonicalization, and meta content="noindex" (been there done that in 2005). I could be wrong but I do not believe that the site would keep disappearing and reappearing if something like this was the issue. Please feel free to scrutinize the home pages to see if I have overlooked something obvious - I AM getting old. site:dietrichlaw.ca - this site has continually ranked in the top 3 for [kitchener personal injury lawyers] for many years. site:burntucker.com - since we took over this site last year it has moved up to page 1 for [ottawa personal injury lawyers] site:bolandhowe.com - #1 for [aurora personal injury lawyers] site:imranlaw.ca - continually ranked in the top 3 for [mississauga immigration lawyers]. site:canadaenergy.ca - ranks #3 for [ontario hydro plans] Thanks in advance! Jim Donovan, President www.wethinksolutions.com
Technical SEO | | wethink0 -
Google still listing pages from old domain after 2 change requests
Good Morning I put forward the following question in December 2014 https://moz.com/community/q/google-still-listing-old-domain as pages from our old domain www.fhr-net.co.uk were still indexed in Google. We have submitted two change request in WMT, the most recent was over 6 months ago yet the old pages are still being indexed and we can't see why that would be Any advice would be appreciated
Technical SEO | | Ham19790 -
Is there a way to get Google to index more of your pages for SEO ranking?
We have a 100 page website, but Google is only indexing a handful of pages for organic rankings. Is there a way to submit to have more pages considered? I have optimized meta data and get good Moz "on-page graders" or the pages & terms that I am trying to connect....but Google doesn't seem to pick them up for ranking. Any insight would be appreciated!
Technical SEO | | JulieALS0 -
Is Google suppressing a page from results - if so why?
UPDATE: It seems the issue was that pages were accessible via multiple URLs (i.e. with and without trailing slash, with and without .aspx extension). Once this issue was resolved, pages started ranking again. Our website used to rank well for a keyword (top 5), though this was over a year ago now. Since then the page no longer ranks at all, but sub pages of that page rank around 40th-60th. I searched for our site and the term on Google (i.e. 'Keyword site:MySite.com') and increased the number of results to 100, again the page isn't in the results. However when I just search for our site (site:MySite.com) then the page is there, appearing higher up the results than the sub pages. I thought this may be down to keyword stuffing; there were around 20-30 instances of the keyword on the page, however roughly the same quantity of keywords were on each sub pages as well. I've now removed some of the excess keywords from all sections as it was getting in the way of usability as well, but I just wanted some thoughts on whether this is a likely cause or if there is something else I should be worried about.
Technical SEO | | Datel1 -
Can I promote a business page from Kuduz or City Search?
Can I promote a business page from Kuduz or City Search? Will other website link to these pages?
Technical SEO | | KristopherWho0 -
If a page isn't linked to or directly sumitted to a search engine can it get indexed?
Hey Guys, I'm curious if there are ways a page can get indexed even if the page isn't linked to or hasn't been submitted to a search engine. To my knowledge the following page on our website is not linked to and we definitely didn't submit it to Google - but it's currently indexed: <cite>takelessons.com/admin.php/adminJobPosition/corp</cite> Anyone have any ideas as to why or how this could have happened? Hopefully I'm missing something obvious 🙂 Thanks, Jon
Technical SEO | | TakeLessons0 -
Should I not Change the URL of Ranking Pages
My site currently ranks #1 or #2 for 2 separate pages on web design & SEO for my geographic location. The URLs are currently mysite.com/services/web-design/ and mysite.com/services/seo/ I'm redesigning my site and I'm taking out the "Services" page as I'm focusing on web design and SEO and lumping everything else into my "Internet Marketing" page. Because my pages for web design and SEO rank so well, should I keep the URL structure even though I don't have a "Services" page or should I just remove services and 301 redirect so I have mysite.com/web-design/ and mysite.com/seo/. I know doing a 301 redirect could hurt me in the short term but I'm wondering if I should just bite the bullet now and change it in favor of a better URL structure. What do you think?
Technical SEO | | JaredDetroit0 -
Why does our page show a description in english in google spain?
Hi! We have a multilingual page and I have set in Google Webmaster Tools the language preference for the root domain to be none, Spanish for the .com/es, English for the .com/en, and German for the .com/de. The title and description show in the right language in Google Germany and google UK, but in google.es (Spain) the title and description appear in English instead of Spanish. Does anybody know why could this be happening and how to fix it? kJtF3.png
Technical SEO | | inmonova0