How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.
-
We're using the Google snapshot method to index dynamic Ajax content. Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like:
#!home/?view_3_page=1
We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots. Like this:
#!home/?view_3_page=10099089
These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots.
Is Google generating random numbers going fishing for content? If so, is this something we can control or minimize?
-
Thanks for the great replies all. Just to clarify, this is the page we're referencing:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=
You can see the one pagination var "next" that points here:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=home/?view_3_page=2
As you can see this is pretty simple. There's only one potential variable (the "prev" and "next" links) for introducing these huge numbers and that's pretty limited. We tested the Google URLs up and down the app and haven't seen anything that would send it fishing for larger numbers. But Google keeps hammering us with:
GET /business-directory-user-demo/?escaped_fragment=home/?view_3_page=1000251
For now we're trying to respond to those with 404s and hope they eventually die.
Unfortunately we can't avoid hashbangs.
-
This seems to do this only for parameters that it has decided "changes, re-orders, or narrows content." They may also crawl things that look like URLs in Javascript even when it's part of a function, but it doesn't seem like that's what's happening in this case.
Depending on the setup of the site, you can either manually configure the variable in WMT (don't do this if the parameter is material), write a clever robots.txt rule (e.g. to block anything after a number of digits after the parameter), or (the best solution) re-work the system to generate URLs that don't rely on parameters.
I'm not sure I understand why the server is rendering a page if the URL isn't supposed to exist. Depending on your server config, you may also be able to return a 404 and make a rule for which (valid) pages to render. From there you can just ignore the 404 errors until Google figures it out.
I think that's the best I can do without seeing the site.
-
I agree with Federico. I've seen Google go fishing with URL parameters (?param=xyz) and I've seen it with AJAX and hashbangs as well. How far they take this and when they choose to apply it doesn't seem to follow a consistent pattern . You can see some folks on StackExchange discussing this, too: http://webmasters.stackexchange.com/questions/25560/does-the-google-crawler-really-guess-url-patterns-and-index-pages-that-were-neve
-
Awesome, thanks for looking into it. We've gotten nowhere with any kind of answer.
-
Hi There
I'm an associate here at Moz, and have asked the other associates if they might know the answer, as this one's a little outside of my experience. Please follow up and let us know if you don't hear from anyone.
Thanks!
-Dan
-
We also noticed some weird crawls last year using random numbers at the end of the URL, checking in google webmaster tools we saw that most of those urls were reported as not found, checking from where the link came from google listed some of our URLs, but didn't had any link to those URLs google was trying to fetch. After 2 or 3 months those crawls stopped. We never knew from where Google got those URLs...
-
Hi Federico, thanks for the response.
Unfortunately this is an SEO solution for a third-party JavaScript product, so removing the hash isn't an option.
I'm still interested in knowing if this is a formal Google practice and if there's some way to control or mitigate this.
-
I think you are right. Google is fishing for content. I would find a solution to make those URL friendly by removing the hash and using some URL rewrite and pushState to paginate that content instead.
Here's a previous question that may help: http://moz.com/community/q/best-way-to-break-down-paginated-content
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why isn't our complete meta title showing up in the Google SERPS? (cut off half way)
We carry a product line, cutless bearings (for use on boats). For instance, we have one, called the Able, that has the following meta title (and searched by View Page Source to confirm): BOOT 1-3/8" x 2-3/8" x 5-1/2" Johnson Cutless Bearing | BOOT Cutlass However, if I search for it on on Google by part number or name (boot cutless bearing, boot cutlass bearing), the meta title comes back with whole first part chopped off, only showing this : "x 5-1/2" Johnson Cutless Bearing | BOOT Cutlass - Citimarine ..." Any idea why? Here's the url if it will hopefully help: https://citimarinestore.com/en/metallic-inches/156-boot-johnson-cutless-bearing-870352103.html All the products in the category are doing the same. Thanks!
Intermediate & Advanced SEO | | Citimarine0 -
Can 'follow' rather than 'nofollow' links be damaging partner's SEO
Hey guys and happy Monday! We run a content rich website, 12+ years old, focused on travel in a specific region, and advertisers pay for banners/content etc alongside editorial. We have never used 'nofollow' website links as they're no explicitly paid for by clients, but a partner has asked us to make all links to them 'nofollow' as they have stated the way we currently link is damaging their SEO. Could this be true in any way? I'm only assuming it would adversely affect them if our website was peanalized by Google for 'selling links', which we're not. Perhaps they're just keen to follow best practice for fear of being seen to be buying links. FYI we now plan to change to more full use of 'nofollow', but I'm trying to work out what the client is refering to without seeming ill-informed on the subject! Thank you for any advice 🙂
Intermediate & Advanced SEO | | SEO_Jim0 -
How long after https migration that google shows in search console new sitemap being indexed?
We migrated 4 days ago to https and followed best practices..
Intermediate & Advanced SEO | | lcourse
In search console now still 80% of our sitemaps appear as "pending" and among those sitemaps that were processed only less than 1% of submitted pages appear as indexed? Is this normal ?
How long does it take for google to index pages from sitemap?
Before https migration nearly all our pages were indexed and I see in the crawler stats that google has crawled a number of pages each day after migration that corresponds to number of submitted pages in sitemap. Sitemap and crawler stats show no errors.0 -
Apps content Google indexation ?
I read some months back that Google was indexing the apps content to display it into its SERP. Does anyone got any update on this recently ? I'll be very interesting to know more on it 🙂
Intermediate & Advanced SEO | | JoomGeek0 -
Google is indexing the wrong page
Hello, I have a site I am optimizing and I cant seem to get a particular listing onto the first page due to the fact google is indexing the wrong page. I have the following scenario. I have a client with multiple locations. To target the locations I set them up with URLs like this /<cityname>-wedding-planner.</cityname> The home page / is optimized for their port saint lucie location. the page /palm-city-wedding-planner is optimized for the palm city location. the page /stuart-wedding-planner is optimized for the stuart location. Google picks up the first two and indexes them properly, BUT the stuart location page doesnt get picked up at all, instead google lists / which is not optimized at all for stuart. How do I "let google know" to index the stuart landing page for the "stuart wedding planner" term? MOZ also shows the / page as being indexed for the stuart wedding planner term as well but I assume this is just a result of what its finding when it performs its searches.
Intermediate & Advanced SEO | | mediagiant0 -
Google Site Extended Listing Not Indexed
I am trying to get the new Site map to be picked up by Google for the extended listing as its pulling from the old links and returning 404 errors. How can I get the site listing indexed quickly and have the extended listing get updated to point to the right places. This is the site - http://epaperflip.com/Default.aspx This is the search with the extended listing and some 404's - Broad Match search for "epaperflip"
Intermediate & Advanced SEO | | Intergen0 -
How is Google's algorithm evolving in terms of DA vs PA value?
how is Google evolving in terms of value for DA vs PA? Is having a link from a DA 75 + PA 25 better than having a link from a DA 50 + PA 50, assuming such 2 websites are otherwise identical? I have a couple of .EDU backlinks where DA is around 80, though PA 1. Would be DA 40 with a PA 40 be more valuable? I hear Google is placing increasing value on the domain and less on the page authority.
Intermediate & Advanced SEO | | knielsen
Any insight appreciated thank you0 -
Is there a FastTrack to re-index? a site?
Hello... i just started with a new client this week, before working with us his last domain-hosting-webdev provider cancel their account and took off the entire site and left them with a nice "under construction page" (NOT) and added the noindex, nofollow tags. 4 weeks after that, we come into the scene and of course our client it's expecting us to reinsert at least for branded terms the site, and he wants it done on a matter of hours... I tried my best to explain that it's not possible and we are doing everything we can't.... now i ask you guys.. I already created de GWT account, Created a well structured Sitemap and submitted it to google and bing, did the onpage optimizitation at least the basics... there is a way to speed up the process? kind of like "hey you! google bot, forget about the noindex nonsense a come crawl again?" Any help would be great Daniel
Intermediate & Advanced SEO | | daniel.alvarez0