URLs appear in Google Webmaster Tools that I can't find on my own site?!?
-
Hi,
I have a Magento e-commerce site (clothing) and when I had a look through some of the sections in Google Webmaster Tools I found URLs that I can't find on my site.
For example, a product url maybe http://www.example.co.uk/product-url/ which is fine. In that product there maybe three sizes of the product (Small, Medium, Large) and for some reason Googlebot is sometimes finding a url like:
http://www.example.co.uk/product-url/1202/ has been found and when clicked on is a live url (Status code: 200) with is one of the sizes (medium). However I have ran a site crawl in Screaming Frog and other crawl tests and can't seem to find where Googlebot is finding these URLs.
I think I need to:
1. Find how Googlebot is finding these urls?
2. Find out how to keep out of index (e.g. robots.txt, canonical etc....
Any help would be much appreciated and I'm happy to share the URL with members if they think they can have a look and help with this problem. I can share specific URLs which might make the issue seem clearer, let me know?
Thanks,
Darrell
-
No problem, glad it resolved the problem.
There are a number of possibilities, probably through one of the following;
- XML sitemap
- Faceted navigation
- Magento pinged Google when the page was created
-
Cheers John, sorted the issue! Appreciate your expertise.
-
Thanks John, your reply was really helpful and I've now done that for the 4000 simple product and now those URLs are returning 404 pages, which is great. Well, just going to see if I can find a mass import 301 redirect extension for Magento to 301 redirect these urls to the homepage so I can redirect them rather than leave as 404 pages.
How do you think Googlebot found those pages as there is no links to them? Maybe through a link when the simple products were loaded to the cart?
-
What is the visibility set to on the simple products for different sizes? If it's set to "Catalog" it will still be crawlable but not appear in your website's internal search results.
Setting the visibility to "Not Visible Individually" should resolve this issue.
-
I had a similar issue (not Magento), turns out it was in the sitemap that was submitted to WMTs, did you check there?
check the url in the open site explore too, it might tell you if any urls are linking to it
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Making html table as 'seofriendly' as possible
Hi, On my website I have a table with a list of products, on every row I have a different product and a different property on each column. The table is made with css so the html code is clean. The problem is (I guess) that google doesn't 'understand' what its inside on the table. So if I do a google search that page appears on the page 87, there is any way to improve my SEO without changing the table? Or to improve my SEO I must change the format of my content? In resume, I want to improve the SEO page of a page that contains information organized inside a table. I don't know if there is a specific answer to this question. Any help is welcome. Regards
Web Design | | jcobo0 -
Multiple sites using same text - how to avoid Google duplicate content penalty?
Hi Mozers, my client located in Colorado is opening a similar (but not identical) clinic in California. Will Google penalize the new California site if we use text from our website that features his Colorado office? He runs the clinic in CO and will be a partner of the clinic in CA, so the CA clinic has his "permission" to use his original text. Eventually he hopes to go national, with multiple sites utilizing essentially the same text. Will Google penalize the new CA site for plagiarism and/or duplicate content? Or is there a way to tell Google, "hey Google, this new clinic is not ripping off my text"?
Web Design | | CalamityJane770 -
Can a cloud based firewall affect my search ranking?
Hi, I recently implemented a firewall on my website to prevent hacking attacks. We were getting a crazy amount of people per day trying to brute force our website. I used the sucuri cloud proxy firewall service which they claim because of the super fast caching actually helps SEO. I was just wondering is this true? Because we're slowly falling further and further down the SERPS and i really don't know why. If not, is there any major google update recently I don't know about? Thanks, Robert
Web Design | | BearPaw880 -
Best Captcha Recommendations for Magento Site?
I am looking for the best captcha solution for our website which is magento based. Currently our web developer is recommending google captcha. Is this just a spam check list or will it do the job well? I would like any other recommendations that are clear for readers and are professional.
Web Design | | TeguarMarketing0 -
New site or fix the old one
I have a delima. Basically the main business product I used to offer is not going to be offered anymore. The types of sales events we conducted for auto dealerships are not able to be insured any longer forcing the change. So I am pivoting to just offering direct mail and I plan on going into digital probably social, landing pages, content marketing and not sure what else. I was able to register http:www.roiautos.com and www.roidirectmail.com both variations of www.roiautosolutions.com withc was the original site. Also that is the closest to the actual name of the business. My question is whether to build a site focusing on direct mail using the direct mail dot com, or just to redo the current site. The current site doesn't have much rank if any because the old product was not something that was searched for. As a mater of fact 99% of my business came from referrals and word or mouth so I just never really bothered. My thoughts are that ROI Direct Mail will work better for search and I am even going to use that as a DBA and TM. But I am unsure of what to do for search. One thing that has to happen is that all references to offering staffed sales events have to be removed from any site per my insurance company. Any advice?
Web Design | | roiautos0 -
Can the website pages have the site name like Title of the page | Sitename.com
Hi, Can the website pages have the site name like Title of the page | Sitename.com I have a site with 50K pages and all pages have | Sitename.com mentioned would that be a good practice or bad? Thanks Martin
Web Design | | mtthompsons0 -
Will launching this site get my E-commerce site penalized?
Hello.. I am wondering if you guys think launching a site like this is a good or a bad idea. All of the links on it go directly to the exact corresponding page on the ecommerce site. Do you think Google will penalize my site for launching sites (i have many other domains that i will be setting up similar to this) like this? Thanks...
Web Design | | Prime850 -
Custom URL's with Bigcommerce Issue (Is it worth it?)
We're building out a store in Bigcommerce, who for all intensive purposes is perfect for SEO besides the fact that you can not change the URL's to be custom. My question is, does this kill the SEO value of bigcommerce, despite everything else being great? So for example the URL's for a category page would be something like this www.mysite.com/categories/keyword and the product URL's are pulled in by product name, so product URL's could be something like www.mysite.com/products/Product-Description-Long-223.html (notice the words will be capitalized and their is no way to remove the trailing .html) I could go with Interspire (the liscenced version of Bigcommerce) or Magento so I can custom edit this stuff. But then its a lot more work for my employee's on the buildout.
Web Design | | iAnalyst.com0