URLs appear in Google Webmaster Tools that I can't find on my own site?!?
-
Hi,
I have a Magento e-commerce site (clothing) and when I had a look through some of the sections in Google Webmaster Tools I found URLs that I can't find on my site.
For example, a product url maybe http://www.example.co.uk/product-url/ which is fine. In that product there maybe three sizes of the product (Small, Medium, Large) and for some reason Googlebot is sometimes finding a url like:
http://www.example.co.uk/product-url/1202/ has been found and when clicked on is a live url (Status code: 200) with is one of the sizes (medium). However I have ran a site crawl in Screaming Frog and other crawl tests and can't seem to find where Googlebot is finding these URLs.
I think I need to:
1. Find how Googlebot is finding these urls?
2. Find out how to keep out of index (e.g. robots.txt, canonical etc....
Any help would be much appreciated and I'm happy to share the URL with members if they think they can have a look and help with this problem. I can share specific URLs which might make the issue seem clearer, let me know?
Thanks,
Darrell
-
No problem, glad it resolved the problem.
There are a number of possibilities, probably through one of the following;
- XML sitemap
- Faceted navigation
- Magento pinged Google when the page was created
-
Cheers John, sorted the issue! Appreciate your expertise.
-
Thanks John, your reply was really helpful and I've now done that for the 4000 simple product and now those URLs are returning 404 pages, which is great. Well, just going to see if I can find a mass import 301 redirect extension for Magento to 301 redirect these urls to the homepage so I can redirect them rather than leave as 404 pages.
How do you think Googlebot found those pages as there is no links to them? Maybe through a link when the simple products were loaded to the cart?
-
What is the visibility set to on the simple products for different sizes? If it's set to "Catalog" it will still be crawlable but not appear in your website's internal search results.
Setting the visibility to "Not Visible Individually" should resolve this issue.
-
I had a similar issue (not Magento), turns out it was in the sitemap that was submitted to WMTs, did you check there?
check the url in the open site explore too, it might tell you if any urls are linking to it
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is my financial services site being flagged as gambling
Watchguard and Websense/Forecepoint are flagging my financial services site gambling...how can I prevent that from happening. https://fwag.com/
Web Design | | AdsposureDev0 -
Website Redesign - What to do with old 301 URLs?
My current site is on wordpress. We are currently designing a new wordpress site, with the same URLs. Our current approach is to go into the server, delete the current website files and ad the new website files. My current site has old urls which are 301 redirected to current urls. Here is my question. In the current redesign process, do i need to create pages for old the 301 redirected urls so that we do not lose them in the launch of the new site? or is the 301 command currently existing outside of our server so this does not matter? Thank you in advance.
Web Design | | CamiloSC0 -
What do you use for test rendering your dev site?
I'm redesigning our company ecommerce site and need to test render an infinite scroller to ensure that it is as SEO friendly as possible. My problem is that I cannot view it in Webmaster Tools since I am blocking the site from crawlers using robots.txt. I know I could simply unblock Google temporarily but I really would rather not make my dev site available to search engine crawlers.
Web Design | | bearpaw0 -
Site Rebuild -Larger to smaller
Hi All, We are rebuilding an existing site which has around 230 Pages (lots of content not required) down to around 20. Whats the best way to 301 redirect the pages that are going to be removed- (we wont be able to use .htaccess because we are moving to Adobe Business Catalyst) Thoughts? We are trying to preserve as much SEO value as possible.....
Web Design | | OnlineAssetPartners0 -
Does Google take email server IP blacklists into account?
This is just a hypothetical, but would Google use information from email server blacklists to determine the quality of a website? The reason is that we're planning to code in an e-mail queuing system for our next CMS, and we would put SPF and DKIM in place. We wouldn't be sending any bulk e-mails (we use Constant Contact for this), but we might be sending personalised follow up e-mails, unpaid order emails and that sort of thing. There's no reason to think we'll be blacklisted, but from experience I know that these email blacklist directories quite often give false positives when an e-mail server is incorrectly configured. So the risk is that we might get blacklisted by mistake when we start using this new feature. Would Google take this into account as part of the algorithm? And if so, would the damage be permanent? (I.e. does getting removed from the blacklist mean Google will stop thinking we're a low quality / spammy site)
Web Design | | OptiBacUK0 -
Site Activity, SEO, and behind login
I have a site that provides online education and as such, most of the user activity happens behind a login. This has me thinking about potential SEO impacts with a few questions that maybe someone could lend some light on: How important is activity (above just search activity) to the search engines Would it help to enter these pages, even though they're behind a login, into GA as we have with the front-end of the site Does a subdomain make a difference (right now we implement the course as a subdomain of the main site Lastly, as I was looking at compete.com, I am wondering how they get these use statistics?
Web Design | | uwaim20120 -
E-commerce Site Layout
I have got an e-commerce site already and i am thinking of staring a new e-commerce site with new domain, but i will keep the layout and design same. Products would be similar to what i sell on my other ecommerce site Will Google Penalise me of having same layout and same design or is it ok to open multiple ecommerce sites with same layouts
Web Design | | usef4u0 -
Testing your code and site
I’ve got various WordPress websites with the Share This social plugin for WordPress. I have been using Firebug and http://analyze.websiteoptimization.com/wso to do general checks on the site and the code. And used W3C validator too. Due to the way WordPress appears to work we never seem to be able to get all the firebug/ website optimization tests to pass and the W3C validator passes everything on HTML 5 apart from 7 errors with the Share This social plugin. How do you test your code/websites? Should I stop be a perfectionist and just be happy?
Web Design | | JohnW-UK0