How to detect where Google gets indexed URL's
-
Google index some kind of way some links that create duplicate content. We doesn't understand how these are created so we would like detect where Google robots find these links.
We tried:
- Moz Crawl Diagnostics but it shows 0 as Internal Link Count for these kind of links.
- Find some information from Google Analytics, that maybe there is trace (site content - all content) from visitors side. There wan't.
- We tried to find some information in Webmaster Tools under Internal link and HTML Improvements but didn't find any trace.
- Tried some search commands. Is there maybe some good one to search.
- TO search URL's form code with https://search.nerdydata.com.
-
It really isn't possible for an outsider to know why your website is generating those URLs in error; you would have to talk to your developer about that.
As far as canonicals, if your problem is page.com is getting duplicated by added parameters: page.com/?id=1, page.com/?id=2, page.com/?id=3, etc. as long as you have the canonical on page.com, all of the parameter pages will have the correct canonical on them as well. (But you are right, you should track down the source; your developer will know.)
-
Thanks you for your answer but yes I know that these are generated by our site. But problem is that I can use canonical tag for these that are indexed right now but later new ones will be created someway. Problem root isn't that we doesn't know how to use canonical, it's how to get to know where these URL's are find/indexed/detected by Google.
These kind of URL's have been there for months so we can't just hope that somehow these will be droped. We need to find some kind of solution and detect real problem.
-
If you found those URLs by doing a site: search, then those parameters are being generated by your site. (I am surprised that Google is even indexing them; I assume that pretty soon all but one will be dropped.) Here is an article that explains more about those types of duplicate pages: http://moz.com/blog/which-page-is-canonical
You can fix this by using a canonical tag on your homepage with the version that doesn't have the parameter.
-
Our front page has almost 50 duplicate versions. These are shown when we do site:oursite.com, there are /et?id=xx, /et?productId=xx, etc. In URL xx are different numbers.
-
Where are you seeing these duplicate content links? Does Webmaster Tools say that they are duplicate content? Or does this show up in your Moz crawl? What do these URLs look like?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Whats the best way to move 30% of our content behind a paywall and still get indexed without penalties and without letting people see our content before they subscribe.
Hi all - We want to create a membership program so that they can get more great stuff from us and offers, deals, etc. but only if they qualify to be a member via a purchase for example. The question is we want to move only some of our content (c.30%) behind the membership curtain - will be a mix of SEO value content. There are few questions/ concerns I am hoping you the SEO community can help me with: How can i ensure Google continues to index it without getting penalized. If i tell google bot to index but not allow Google and other sites to see the membership content will that create a penalty? Is that considered a form of cloaking? How can i prevent having to reveal 3 pages a day under Google's First Click Free set-up. I suppose i want my cake and eat it and i suspect the answer is well i cant. Any help or insights that can help me make this decision better is gratefully accepted.
Reporting & Analytics | | Adrian-phipps0 -
Google Analytics Goal Tracking Not Working
I am trying to install goal tracking for a client to measure leads from SEO, and I'm running into problems with the Google Analytics goal tracking not working. The only thing that I can think of that is preventing goal tracking from working is that the URL confirming page request URI looks like "contact/#contact-form-4475" and perhaps the hashtag is causing problems. I wanted to ask the community if any of you have run into basic conversion tracking problems like this, where you need to track form fills on a website via a confirmation URL/destination URL goal, and what you did to solve it? I appreciate any input on how to get data on conversions either by fixing the goal in GA or using a different tracking tool. Thanks!
Reporting & Analytics | | williammarlow0 -
What to Index?
We are using wordpress and seo plugin from yoast. We have set indexed all the posts, but not categories or tags in order to avoid a duplicate as those categories contain the posts. My question is, is it possible to set for index rather the categories and then set posts non-index? Would be then posts in categories still index?
Reporting & Analytics | | VillasDiani0 -
301 Redirect 'https'? First post - Newbie.
Good afternoon, Thank you in advance for your help - this is my first post and I am new to all of this. Situation: I've setup 301 redirects for www.thechiplab.com to my new site www.chiplab.com (recently launched e-commerce site on Magento) through cPanel. Problem: Some of my best links are to my old ''https:" www.thechiplab.com secure domain (ex. http://techcrunch.com/2006/12/22/why-doesnt-cafepress-use-flash/) and are not being "passed" on to the new domain. (Open Site Explorer) Is it possible to recover any of the PR from the old secure site? Thanks again, Chase
Reporting & Analytics | | chiplab0 -
Did I get penalized by Panda 3.9?
Hey guys, So I have been on Google analytics and I looked at Traffic Sources > Search Engine Optimization > Queries & Landing Pages From Tuesday until today I am showing 0 impressions for top queries when we usually average around 9000 impressions daily, I am seeing the same for landing pages. When I go into google webmaster tools I am seeing 0 data from tues-today. It doesnt say that I have 0 impressions, it is just not showing me those dates, almost like I am unable to see data that is so close to today. The weird thing is that traffic to the sites in question is higher or normal to what it has been the whole month which makes me think that this is some sort of glitch. If we are being penalized and indeed have 0 impressions via search then our traffic would drop tremendously. When I sign out of google and search for our search terms, I am seeing our site pop up right where they should be. When I use the Page Rank tool I am also seeing us in the SERPS where we should be. Wed morning when I came into work after SEOmoz had an updated keyword rank report, I saw some great gains for a lot of our keywords. I am super confused and hoping that I was not part of the 1% affected. There is no reason for our sites to be penalized, they have great authority, healthy traffic, and go by SEO best practices. We have great content! Has anybody ever ran into this problem? Could use some guidance!
Reporting & Analytics | | PatBausemer0 -
My first campaign identidied long URLs
Hello! 🙂 I've just created my first campaign, and the crawling proccess have detected posts with long URL (more than 70 characters). If I change it, i.e., alter the URL's, can some problem happens to my blog? Or do I have to disconsider this problem and just "work correctly" from now on? Thanks in advance for your help!
Reporting & Analytics | | Andarilho0 -
Why are Seemingly Randomly Generated URLs Appearing as Errors in Google Webmaster Tools?
I've been confused by some URLs that are showing up as errors in our GWT account. They seem to just be randomly generated alphanumeric strings that Google is reporting as 404 errors. The pages do 404 because nothing ever existed there or was linked to. Here are some examples that are just off of our root domain: /JEzjLs2wBR0D6wILPy0RCkM/WFRnUK9JrDyRoVCnR8= /MevaBpcKoXnbHJpoTI5P42QPmQpjEPBlYffwY8Mc5I= /YAKM15iU846X/ymikGEPsdq 26PUoIYSwfb8 FBh34= I haven't been able to track down these character strings in any internet index or anywhere in our source code so I have no idea why Google is reporting them. We've been pretty vigilant lately about duplicate content and thin content issues and my concern is that there are an unspecified number of urls like this that Google thinks exist but don't really. Has anyone else seen GWT reporting errors like this for their site? Does anyone have any clue why Google would report them as errors?
Reporting & Analytics | | kimwetter0 -
Historical Indexation
Hello, Is there at tool to see how many pages were indexed in google for a particular website historically? Thanks
Reporting & Analytics | | soeren.hofmayer0