Huge Google index on E-commerce site
-
Hi Guys,
Refering back to my original post I would first like to thank you guys for all the advice.
We implemented canonical url's all over the site and noindexed some url's with robots.txt and the site already went from 100.000+ url's indexed to 87.000 urls indexed in GWT.
My question: Is there way to speed this up?
I do know about the way to remove url's from index (with noindex of robots.txt condition) but this is a very intensive way to do so.I was hoping you guys maybe have a solution for this..
-
Hi,
A few weeks later now and index is now on 63.000 url's so that's a good thing.
Another weird thing is the following.
There's a (old) url still in the index. When i visit it redirects me to the new url, which is good. Cache date is 2 weeks ago but Google still shows the old url.
How is this possible? The 301 redirect is already in place since April 2013.
-
Hi allen Jarosz!
Thanks for your reply
I've actually done all the things you said in the last few weeks. Site is totally indexed but the main problem is that are over 85.000 url's indexed but the site only exists of 13.000 urls.
So the main question is wether i can speed things up in one way or another to get those 70.000 url's deindexed.Are any options besides noindex, robots.txt and removing some url's ? Because now it's just waiting.
It looks like we are going the right way when you check the image.
-
SSiebn,
I have had some success in speeding things up, but only to a point.
Google webmaster tools is a GREAT tool that fortunately for us Google allows us to use, and its free!
I'm sure you probably already use the service, but I have found a few ways to use the tools to improve their scan rate. First block the spiders from crawling any pages you don't want indexed, for instance your backend files, this allows more time to be spent on the pages you want indexed. Second ensure you pages link to each other in the site, this allows pages to be linked by flowing through to each other, (no dead ends). Third use "Fetch as Google" from WMT, you are allowed up to 10 fetches. These fetches can be configured to follow linking pages, once crawled, you may submit the results to the Google index, with up to 500 fetches. It may be beneficial to submit for "Fetch as Google" your main categories. Lastly check your "Crawl Rate" to ensure that you have chosen "<label for="recommendedType">Let Google optimize for my site (recommended)</label>"
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Not Indexing After 2 Weeks - PA at 1
Hi Moz Community! I'm working as a digital marketing consultant for an organization that uses us for their online registration - we do not manage their web page. The issue that I am hoping you might have some ideas on is that their SERPS still aren’t making much of a recovery since they revamped their site in mid August. I ran a MOZ campaign for them and despite that they (eventually) got all their 301s in place, they submitted an updated sitemap to Google, aren’t hitting any crawl errors, and have a working robots.txt over two-thirds of their site pages don’t seem to be indexing. MOZ is giving most of them a Page Authority of 1, and when I login to their GWT, it’s showing me that only 3 pages have been indexed of the 315 URLs submitted. I know Google doesn’t make any guarantees in index update timelines, but 2+ weeks seems like a long time 😞 Their website is https://www.northshoreymca.org/. The site has a DA of 43 but most of the pages on the main nav are still at 1. They gave me permission to share in this forum because we're really trying to figure out a recovery strategy. Any thoughts or ideas as to what might be causing this? Is there anything else that you think I should check or that might be causing an issue? Is it possible that Google is just taking this long to index their page? Note: this page is built with Drupal. THANK YOU!
Intermediate & Advanced SEO | | camarin_w0 -
Does Google Index URLs that are always 302 redirected
Hello community Due to the architecture of our site, we have a bunch of URLs that are 302 redirected to the same URL plus a query string appended to it. For example: www.example.com/hello.html is 302 redirected to www.example.com/hello.html?___store=abc The www.example.com/hello.html?___store=abc page also has a link canonical tag to www.example.com/hello.html In the above example, can www.example.com/hello.html every be Indexed, by google as I assume the googlebot will always be redirected to www.example.com/hello.html?___store=abc and will never see www.example.com/hello.html ? Thanks in advance for the help!
Intermediate & Advanced SEO | | EcommRulz0 -
E-commerce System without error page
I´d love to know your thoughts about this particular issue: Vtex is top3 e-commerce system in brazil. ( issue is huge) the system do not use 4XX responde codes If there is a error page, they just redirect it to a search page with 200 code. in Google index we can find a lot of "empty" pages ( indexed error pagess) We can´t use noindex for them Example:
Intermediate & Advanced SEO | | SeoMartin1
http://www.taniabulhoes.com.br/this-is-a-test
OR
http://www.taniabulhoes.com.br/thisisatest Any suggestions?0 -
Using rel cannonical to host a blog as a path on our e-commerce website
There has been recent suggestion (from Rand) that hosting your blog as a folder rather than a subdomain is much better from an SEO point of view. Unfortunately, our blog is hosted on a subdomain with a different technology stack to the main e-commerce site. We are finding it quite tricky to migrate to a folder given the different technologies. Is the following a suitable solution? - 301 redirect from mysite.com/blog/cool-blog-post to blog.mysite.com/cool-blog-post - And then put mysite.com/blog/cool-blog-post" /> on blog.mysite.com/cool-blog-post Would be great to have your thoughts on this guys - I can't figure out if it will work or be an SEO fail.
Intermediate & Advanced SEO | | HireSpace0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
Does Google Index Videos onsite when using JQuery?
Hi, I'm showing my videos using jquery lightbox etc. This means that I do not have the normal YouTube "embedding" code onpage. Does anyone know if Google will somehow index my videos? Any solutions / ideas? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Google isn't displaying the www. for my site in the SERPS
I noticed that every other site url in the serps for my main keywords has a www. on their display url except mine. I have the site set to display the www. Can this potentially hurt my SEO and what can I do to fix this? Thanks Aaron. www.png
Intermediate & Advanced SEO | | afranklin0