How to detect where Google gets indexed URL's

raido

Google index some kind of way some links that create duplicate content. We doesn't understand how these are created so we would like detect where Google robots find these links.

We tried:

Moz Crawl Diagnostics but it shows 0 as Internal Link Count for these kind of links.
Find some information from Google Analytics, that maybe there is trace (site content - all content) from visitors side. There wan't.
We tried to find some information in Webmaster Tools under Internal link and HTML Improvements but didn't find any trace.
Tried some search commands. Is there maybe some good one to search.
TO search URL's form code with https://search.nerdydata.com.

Linda-Vassily

It really isn't possible for an outsider to know why your website is generating those URLs in error; you would have to talk to your developer about that.

As far as canonicals, if your problem is page.com is getting duplicated by added parameters: page.com/?id=1, page.com/?id=2, page.com/?id=3, etc. as long as you have the canonical on page.com, all of the parameter pages will have the correct canonical on them as well. (But you are right, you should track down the source; your developer will know.)

raido

Thanks you for your answer but yes I know that these are generated by our site. But problem is that I can use canonical tag for these that are indexed right now but later new ones will be created someway. Problem root isn't that we doesn't know how to use canonical, it's how to get to know where these URL's are find/indexed/detected by Google.

These kind of URL's have been there for months so we can't just hope that somehow these will be droped. We need to find some kind of solution and detect real problem.

Linda-Vassily

If you found those URLs by doing a site: search, then those parameters are being generated by your site. (I am surprised that Google is even indexing them; I assume that pretty soon all but one will be dropped.) Here is an article that explains more about those types of duplicate pages: http://moz.com/blog/which-page-is-canonical

You can fix this by using a canonical tag on your homepage with the version that doesn't have the parameter.

raido

Our front page has almost 50 duplicate versions. These are shown when we do site:oursite.com, there are /et?id=xx, /et?productId=xx, etc. In URL xx are different numbers.

Linda-Vassily

Where are you seeing these duplicate content links? Does Webmaster Tools say that they are duplicate content? Or does this show up in your Moz crawl? What do these URLs look like?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to detect where Google gets indexed URL's

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Whats the best way to move 30% of our content behind a paywall and still get indexed without penalties and without letting people see our content before they subscribe.

Google Analytics Goal Tracking Not Working

What to Index?

301 Redirect 'https'? First post - Newbie.

Did I get penalized by Panda 3.9?

My first campaign identidied long URLs

Why are Seemingly Randomly Generated URLs Appearing as Errors in Google Webmaster Tools?

Historical Indexation