Meta No INDEX and Robots - Optimizing Crawl Budget
-
Hi,
Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view".
So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues.
They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears -
So can I block these off now with robots.txt to optimize my crawl budget?
Thanks
-
Are you still linking to those pages? If so, I would just keep the noindex,follow meta in the header. That way you still benefit from the link juice flowing to these pages, link juice that would be lost if you just blocked them from being crawled via robots.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Meta description duplication
Hello, What does google mean by don't duplicate your meta description. For example if I my meta says : Stunning golf holidays in Florida , call xxxx and book today. and I have another page with golf holiday but in ireland this time. If I write Stunning golf holidays in Ireland , call xxxx and book today. Is it considered duplicate ?
Intermediate & Advanced SEO | | seoanalytics0 -
Website dropped out from Google index
Howdy, fellow mozzers. I got approached by my friend - their website is https://www.hauteheadquarters.com She is saying that they dropped from google index over night - and, as you can see if you google their name, website url or even site: , most of the pages are not indexed. Home page is nowhere to be found - that's for sure. I know that they were indexed before. Google webmaster tools don't have any manual actions (at least yet). No sudden changes in content or backlink profile. robots.txt has some weird rule - disallow everything for EtaoSpider. I don't know if google would listen to that - robots checker in GWT says it's all good. Any ideas why that happen? Any ideas what I should check? P.S. Just noticed in GWT there was a huge drop in indexed pages within first week of August. Still no idea why though. P.P.S. Just noticed that there is noindex x-robots-tag in headers... Anyone knows where this can be set?
Intermediate & Advanced SEO | | DmitriiK0 -
Mobile Meta Descriptions?
Hi Guys, We have two different versions of each page for both desktop and mobile example: http://tinyurl.com/zkxlxax http://tinyurl.com/zelqcbv We want to create meta descriptions for both versions. However in the CMS (http://www.mantistech.com.au/ecommerce_website_package.aspx) it only allows one meta description. Example: http://s21.postimg.org/ar8bzrh3r/screenshot_1804.jpg Does anyone know anyway around this? To add two different meta descriptions and tell Google which one to use based on device type. Thanks.
Intermediate & Advanced SEO | | jayoliverwright0 -
How to find all indexed pages in Google?
Hi, We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools. How can I get a list of all pages indexed of our domain? trying to locate the duplicate content. Doing a "site:www.mydomain.com" only returns up to 676 results... Any ideas? Thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?
Our not-so-lovely CMS loves to render pages regardless of the URL structure, just as long as the page name itself is correct. For example, it will render the following as the same page: example.com/123.html example.com/dumb/123.html example.com/really/dumb/duplicative/URL/123.html To help combat this, we are creating mod rewrites with friendly urls, so all of the above would simply render as example.com/123 I understand robots.txt respects the wildcard (*), so I was considering adding this to our robots.txt: Disallow: */123.html If I move forward, will this block all of the potential permutations of the directories preceding 123.html yet not block our friendly example.com/123? Oh, and yes, we do use the canonical tag religiously - we're just mucking with the robots.txt as an added safety net.
Intermediate & Advanced SEO | | mrwestern0 -
Google Sitemap only indexing 50% Is that a problem?
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem? We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
Intermediate & Advanced SEO | | EcommerceSite0 -
Not using a robot command meta tag
Hi SEOmoz peeps. Was doing some research on robot commands and found a couple major sites that are not using them. If you check out the code for these: http://www.amazon.com http://www.zappos.com http://www.zappos.com/product/7787787/color/92100 http://www.altrec.com/ You fill not find a meta robot command line. Of course you need the line for any noindex, nofollow, noarchive pages. However for pages you want crawled and indexed, is there any benefit for not having the line at all? Thanks!
Intermediate & Advanced SEO | | STPseo0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1