URLs appear in Google Webmaster Tools that I can't find on my own site?!?
-
Hi,
I have a Magento e-commerce site (clothing) and when I had a look through some of the sections in Google Webmaster Tools I found URLs that I can't find on my site.
For example, a product url maybe http://www.example.co.uk/product-url/ which is fine. In that product there maybe three sizes of the product (Small, Medium, Large) and for some reason Googlebot is sometimes finding a url like:
http://www.example.co.uk/product-url/1202/ has been found and when clicked on is a live url (Status code: 200) with is one of the sizes (medium). However I have ran a site crawl in Screaming Frog and other crawl tests and can't seem to find where Googlebot is finding these URLs.
I think I need to:
1. Find how Googlebot is finding these urls?
2. Find out how to keep out of index (e.g. robots.txt, canonical etc....
Any help would be much appreciated and I'm happy to share the URL with members if they think they can have a look and help with this problem. I can share specific URLs which might make the issue seem clearer, let me know?
Thanks,
Darrell
-
No problem, glad it resolved the problem.
There are a number of possibilities, probably through one of the following;
- XML sitemap
- Faceted navigation
- Magento pinged Google when the page was created
-
Cheers John, sorted the issue! Appreciate your expertise.
-
Thanks John, your reply was really helpful and I've now done that for the 4000 simple product and now those URLs are returning 404 pages, which is great. Well, just going to see if I can find a mass import 301 redirect extension for Magento to 301 redirect these urls to the homepage so I can redirect them rather than leave as 404 pages.
How do you think Googlebot found those pages as there is no links to them? Maybe through a link when the simple products were loaded to the cart?
-
What is the visibility set to on the simple products for different sizes? If it's set to "Catalog" it will still be crawlable but not appear in your website's internal search results.
Setting the visibility to "Not Visible Individually" should resolve this issue.
-
I had a similar issue (not Magento), turns out it was in the sitemap that was submitted to WMTs, did you check there?
check the url in the open site explore too, it might tell you if any urls are linking to it
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google analytic's API information
I have multiple websites and instead of having to log in to each google analytics I want to create a dashboard inside my MIS that has the audience overview graph, is there any way to use API to do this? Is there a way to
Web Design | | BobAnderson0 -
I am Using <noscript>in All Webpage and google not Crawl my site automatically any solution</noscript>
| |
Web Design | | ahtisham2018
| | <noscript></span></td> </tr> <tr> <td class="line-number"> </td> <td class="line-content"><meta http-equiv="refresh" content="0;url=errorPages/content-blocked.jsp?reason=js"></td> </tr> <tr> <td class="line-number"> </td> <td class="line-content"><span class="html-tag"></noscript> | and Please tell me effect on seo or not1 -
How to find out that none of the images on my site violates copyrights? Is there any tool that can do this without having to check manually image by image?
We plan to add several thousand images to our site and we outsourced the image search to some freelancers who had instructions to just use royalty free pictures. Is there any easy and quick way to check that in fact none of these images violates copyrights without having to check image by image? In case there are violations we are unaware of, do you think we need to be concerned about a risk of receiving Takedown Notices (DMCA) before owner giving us notification for giving us opportunity to remove the photo?
Web Design | | lcourse1 -
Need to hire a tech for find out why google spider can't access my site
Google spider can't access my site and all my pages are gone out of serps. I've called network solutions tech support and they say all is fine with the site which is wrong. does anyone know of a web tech who i can hire to fix this issue? Thanks, Ron
Web Design | | Ron100 -
I need help with international SEO for two sites?
I'll try to keep this clear... I am working with an company based in Germany, they own company.com/de and company.com/en, and that's how they are currently structuring their domains. They also own companyusa.com that they really want to show up in USA only. They want to keep company.com/en for England/english speaking Europe and company.com/de for their German audience in Germany. They are wanting us to optimize/SEO for companyusa.com, and they want that URL to show up as the top google search in the USA for their "company" keyword. What is showing up now is www.company.com/en 1st in Google because it's been around longer and it has more domain authority. What is the best practice for us optimize companyusa.com so that it is the top dog in the USA while not messing up the other domains? Should we merge? Subfolders all around? Thanks for all the input.
Web Design | | Rocket.Fuel0 -
Google search issue with exact domain
We had a site from Feb-2011 to Nov-2011 at the domain amcoexterminating.com. The site was pure HTML/CSS and the daily unique visitors steadily increased over that time. So all was fine. We then moved the site to a CMS (Joomla) on Dec. 6th. From that day forward, the daily visitors went into the tank. Before the move, if you typed "amcoexterminating.com" or "amco exterminating" into Google search, the site would be the first result (as you'd expect since those are the words that make up the actua domain). But we tried this yesterday and the site did not come up at all. NOT GOOD. It would work in Yahoo or Bing, but not in Google. So obviously, the problem with Google search directly affected the daily visitors. We just checked Webmaster tools yesterday (yes, this should have been done sooner, lesson learned) and it said "Site has severe health issues - Important page blocked by robots.txt". It listed the "important" page URL and it was just a link to an image. Regardless, I wiped out the Joomla created robots.txt file and added a new one and made it just say... User-agent: *Allow: / About 14 hours later, after the new robots.txt file was recognized by Google, the "severe health" message went away. However if I search in Google for "amcoexterminating.com", it still doesn't show up and the client is concerned (as they should be). Do you think the search engines just need more time to refresh? If so, once it refreshes, should the site show up first again right away? Or is it possible the robots.txt file had nothing to do with the issue? If so, what other things could I check into that might cause Google search to not find a site even if you search for exact domain name? Please share any and all things I should look into as I need to get this site showing in Google search again (as it was before moving to the CMS). Thanks!
Web Design | | MarathonMS0 -
Map Search Tools to integrate on Accommodation Website
Hello all, Can any recommend a Map Search tool that i can integrate into my accommodation website. Ideally I want to be able to pin my clients on the map search with links back to their listing page on my website and provide an alternative search facility for clients looking for accommodation. Am covering the South Africa region specifically. Am assuming that i could go down the Google Maps route but would really like to know what alternatives there are on offer. Also what SEO considerations do i need to think about when adding this to my website. Thanks in advance for any help.
Web Design | | SamanthaRiggien0 -
What's the best was to structure Product page information on my site?
Hi - I run a hobby related niche new / article / resource site (http://tinyurl.com/4eavaj4). One of the most critical components of the site is our product database. We don't actually sell anything directly - instead we monetize them by displaying relevant affiliate product feeds and price comparisons. However since the Panda update was implemented in February my traffic (particularly my long tail, product related traffic) has dropped off considerably. I had about a 20% drop in overall traffic, but have made up some of the ground in the past week. However I want to know once and for all how I should structure my product related information as I have a ton of great content that is ready to be published in this section but want to be sure I structure it the best possible way from a SEO standpoint. Here are a few different options I've come up with for displaying information about products on my site. For the purpose of these examples I am going to refer to all of the information that makes up my product pages collectively as "product profiles". Please let me know which is the best SEO wise (or if you have a better way of doing it let me know): - Option 1 - Current Method - Divide Content Sections into different pages / urls Example: http://tinyurl.com/4tpdlbl This is how the majority of my product profiles are currently structured. I did this to improve load times and to keep the total number of links per page down. In addition to the core product profile subpages: "Product Details","Compare Prices", **"**Product Review", "Hot Auctions", and "Checklists", I have the Checklists area further segmented by subset, each of which is on its own page that is only accessible through the main Checklists tab of the profile. - Option 2 - Everything on one url / page the old fashioned way, with everything available by scrolling vertically. This would make the page go on forever though. - Option 3 - Everything on one url / page, but visually segmented using css / javascript tabs. Example: http://tinyurl.com/4kqhauh I looked at the source code and all the page text is there, so it looks like it would be spider-able but you tell me. Or would another method of tabbing be better? My site is wordpress based so the functionality comes from a plugin. - Option 4 - Use post tabs that are technically all on the same page, but make each individual tab be accessible through its own suburl, all of which share the same core canonical url. Example: http://tinyurl.com/4bs9pjs Clicking on any of the individual tabs will result in something like ?postTabs=2 being appended to the core url. Example: http://tinyurl.com/4gvgufc Any input would be greatly appreciated asap! Thanks Mike
Web Design | | MikeATL0