How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.
-
We're using the Google snapshot method to index dynamic Ajax content. Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like:
#!home/?view_3_page=1
We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots. Like this:
#!home/?view_3_page=10099089
These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots.
Is Google generating random numbers going fishing for content? If so, is this something we can control or minimize?
-
Thanks for the great replies all. Just to clarify, this is the page we're referencing:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=
You can see the one pagination var "next" that points here:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=home/?view_3_page=2
As you can see this is pretty simple. There's only one potential variable (the "prev" and "next" links) for introducing these huge numbers and that's pretty limited. We tested the Google URLs up and down the app and haven't seen anything that would send it fishing for larger numbers. But Google keeps hammering us with:
GET /business-directory-user-demo/?escaped_fragment=home/?view_3_page=1000251
For now we're trying to respond to those with 404s and hope they eventually die.
Unfortunately we can't avoid hashbangs.
-
This seems to do this only for parameters that it has decided "changes, re-orders, or narrows content." They may also crawl things that look like URLs in Javascript even when it's part of a function, but it doesn't seem like that's what's happening in this case.
Depending on the setup of the site, you can either manually configure the variable in WMT (don't do this if the parameter is material), write a clever robots.txt rule (e.g. to block anything after a number of digits after the parameter), or (the best solution) re-work the system to generate URLs that don't rely on parameters.
I'm not sure I understand why the server is rendering a page if the URL isn't supposed to exist. Depending on your server config, you may also be able to return a 404 and make a rule for which (valid) pages to render. From there you can just ignore the 404 errors until Google figures it out.
I think that's the best I can do without seeing the site.
-
I agree with Federico. I've seen Google go fishing with URL parameters (?param=xyz) and I've seen it with AJAX and hashbangs as well. How far they take this and when they choose to apply it doesn't seem to follow a consistent pattern . You can see some folks on StackExchange discussing this, too: http://webmasters.stackexchange.com/questions/25560/does-the-google-crawler-really-guess-url-patterns-and-index-pages-that-were-neve
-
Awesome, thanks for looking into it. We've gotten nowhere with any kind of answer.
-
Hi There
I'm an associate here at Moz, and have asked the other associates if they might know the answer, as this one's a little outside of my experience. Please follow up and let us know if you don't hear from anyone.
Thanks!
-Dan
-
We also noticed some weird crawls last year using random numbers at the end of the URL, checking in google webmaster tools we saw that most of those urls were reported as not found, checking from where the link came from google listed some of our URLs, but didn't had any link to those URLs google was trying to fetch. After 2 or 3 months those crawls stopped. We never knew from where Google got those URLs...
-
Hi Federico, thanks for the response.
Unfortunately this is an SEO solution for a third-party JavaScript product, so removing the hash isn't an option.
I'm still interested in knowing if this is a formal Google practice and if there's some way to control or mitigate this.
-
I think you are right. Google is fishing for content. I would find a solution to make those URL friendly by removing the hash and using some URL rewrite and pushState to paginate that content instead.
Here's a previous question that may help: http://moz.com/community/q/best-way-to-break-down-paginated-content
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google not indexing images
Hi there, We have a strange issue at a client website (www.rubbermagazijn.nl). Webpage are indexed by Google but images are not, and have never been since the site went live in '12 (We recently started SEO work on this client). Similar sites like www.damenrubber.nl are being indexed correctly. We have correct robots and sitemap setup and directions. Fetch as google (Search Console) shows all images displayed correctly (despite scripted mouseover on the page) Client doesn't use CDN Search console shows 2k images indexed (out of 18k+) but a site:rubbermagazijn.nl query shows a couple of images from PDF files and some of the thumbnails, but no productimages or category images from homepage. (product page example: http://www.rubbermagazijn.nl/collectie/slangen/olie-benzineslangen/7703_zwart_nbr-oliebestendig-6mm-l-1000mm.html) We've changed the filenames from non-descriptive names to descriptive names, without any result. Descriptive alt texts were added We're at a loss. Has anyone encountered a similar issue before, and do you have any advice? I'd be happy to provide more information if needed. CBqqw
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
We're currently not using schemas on our website. How important is it? And are websites across the globe using it?
Schemas looks like an important thing when it comes to structuring your website and ensuring the crawl bots get all the details. I've been reading a lot of articles around the web and most of them are saying that schemas are important but very few websites are using it. Why so? Are the schemas on schema.org there to stay or am I wasting my time?
Intermediate & Advanced SEO | | Shreyans920 -
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page?
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page? If I have 4 or 5 different hashtag link section pages , consolidated into one HTML Page, no chance to get one of the Hashtag Pages to appear as a search result? like, if under one Single Page Travel Guide I have two essential sections: #Attractions #Visa no chance to direct search queries for Visa directly to the Hashtag Link Section of #Visa? Thanks for any help
Intermediate & Advanced SEO | | Muhammad_Jabali0 -
Is 301 redirecting your index page to the root '/' safe to do or do you end up in an endless loop?
Hi I need to tidy up my home page a little, I have some links to our index.html page but I just want them to go to the root '/' so I thought I could 301 redirect it. However is this safe to do? I'm getting duplicate page notifications in my analytic reportings tools about the home page and need a quick way to fix this issue. Many thanks in advance David
Intermediate & Advanced SEO | | David-E-Carey0 -
Huge Google Dance For Some Rankings. What Gives?
I've got a relatively new website (launched at the beginning of June 2013). For some keywords I'm targeting, it first ranked around page 15. It made huge jumps to finally rank on page 2 or 3. Since then, it goes back to page 15 and then back to page 3. It does this every now and then. Any ideas?
Intermediate & Advanced SEO | | sbrault740 -
Is my text readable? I don't see it in the page source
Text on my site seems to be readable in a text only version (the page is not cached so I viewed it by disabling JAVA and then copy and pasted the page into Word) However, when I look in the page source I don't see the text there. The text was created using Open X html boxes to help us with formatting, but is this causing an SEO problem?
Intermediate & Advanced SEO | | theLotter0 -
Can I, in Google's good graces, check for Googlebot to turn on/off tracking parameters in URLs?
Basically, we use a number of parameters in our URLs for event tracking. Google could be crawling an infinite number of these URLs. I'm already using the canonical tag to point at the non-tracking versions of those URLs....that doesn't stop the crawling tho. I want to know if I can do conditional 301s or just detect the user agent as a way to know when to NOT append those parameters. Just trying to follow their guidelines about allowing bots to crawl w/out things like sessionID...but they don't tell you HOW to do this. Thanks!
Intermediate & Advanced SEO | | KenShafer0 -
Google+ Pages on Google SERP
Do you think that a Google+ Page (not profile) could appear on the Google SERP as a Rich Snippet Author? Thanks
Intermediate & Advanced SEO | | overalia0