Robots.txt file in Shopify - Collection and Product Page Crawling Issue
-
Hi, I am working on one big eCommerce store which have more then 1000 Product. we just moved platform WP to Shopify getting noindex issue. when i check robots.txt i found below code which is very confusing for me. **I am not getting meaning of below tags.**
- Disallow: /collections/+
- Disallow: /collections/%2B
- Disallow: /collections/%2b
- Disallow: /blogs/+
- Disallow: /blogs/%2B
- Disallow: /blogs/%2b
I can understand that my robots.txt disallows SEs to crawling and indexing my all product pages. ( collection/*+* ) Is this the query which is affecting the indexing product pages?
Please explain me how this robots.txt work in shopify and once my page crawl and index by google.com then what is use of Disallow:
Thanks.
-
Make sure products are in your sitemap and it has been re-submitted. You can also submit your products to request indexing for them in Google Search Console.
-
Thank you for replying,
But, our main issue is that we have already crawled all collection pages but the product pages haven't crawled yet. Now we don't figure out that whether it's robots.txt issue or other crawling issue?
For example: "www.abc.com/collection/" page is crawled but "www.abc.com/collection/product1/" page hasn't crawled.
Please reply me some tips here.
-
While you may not want context indexed, it's still valuable to be crawled and access your most important content like products.
If you are blocking your /collections pages, Google will not be able to see that page's meta robots set to noindex, causing an issue for you. You may consider allowing robots to crawl your /collections pages but noindex them if they are low value or duplicative.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page drops from index completely
We have a page that is ranking organically at #1 but over the past couple of months the page has twice dropped from a search term entirely. There don't appear to be any issues with the page in Search Console and adding the page on https://www.google.com/webmasters/tools/submit-url seems to fix the issue. The search term we're tracking that drops is in the URL for the page and is the h1 of the page. Here is a screenshot of the ranking over the past few months: https://jmp.sh/akvaKGF What could cause this to happen? There is nothing in search console that shows any problems with the page. The last time this happened the page completely dropped on all search terms and showed up again after submitting the url to google manually. This time it dropped on just one search term and reappeared the next day after manually submitting the page again. akvaKGF
White Hat / Black Hat SEO | | russell_ms0 -
Forcing Google to Crawl a Backlink URL
I was surprised that I couldn't find much info on this topic, considering that Googlebot must crawl a backlink url in order to process a disavow request (ie Penguin recovery and reconsideration requests). My trouble is that we recently received a great backlink from a buried page on a .gov domain and the page has yet to be crawled after 4 months. What is the best way to nudge Googlebot into crawling the url and discovering our link?
White Hat / Black Hat SEO | | Choice0 -
Massive site-wide internal footer links to doorway pages: how bad is this?
My company has stuffed several hundred links into the footer of every page. Well, technically not the footer, as they're right at the end of the body tag, but basically the same thing. They are formatted as follows: [" href="http://example.com/springfield_oh_real_estate.htm">" target="_blank">http://example.com/springfield_pa_real_estate.htm">](</span><a class= "http://example.com/springfield_oh_real_estate.htm")springfield, pa real estate These direct to individual pages that contain the same few images and variations the following text that just replace the town and state: _Springfield, PA Real Estate - Springfield County [images] This page features links to help you Find Listings and Homes for sale in the Springfield area MLS, Springfield Real Estate Agents, and Springfield home values. Our free real estate services feature all Springfield and Springfield suburban areas. We also have information on Springfield home selling, Springfield home buying, financing and mortgages, insurance and other realty services for anyone looking to sell a home or buy a home in Springfield. And if you are relocating to Springfield or want Springfield relocation information we can help with our Relocation Network._ The bolded text links to our internal site pages for buying, selling, relocation, etc. Like I said, this is repeated several hundred times, on every single page on our site. In our XML sitemap file, there are links to: http://www.example.com/Real_Estate/City/Springfield/
White Hat / Black Hat SEO | | BD69
http://www.example.com/Real_Estate/City/Springfield/Homes/
http://www.example.com/Real_Estate/City/Springfield/Townhomes/ That direct to separate pages with a Google map result for properties for sale in Springfield. It's accompanied by the a boilerplate version of this: _Find Springfield Pennsylvania Real Estate for sale on www.example.com - your complete source for all Springfield Pennsylvania real estate. Using www.example.com, you can search the entire local Multiple Listing Service (MLS) for up to date Springfield Pennsylvania real estate for sale that may not be available elsewhere. This includes every Springfield Pennsylvania property that's currently for sale and listed on our local MLS. Example Company is a fully licensed Springfield Pennsylvania real estate provider._ Google Webmaster Tools is reporting that some of these pages have over 30,000 internal links on our site. However, GWT isn't reporting any manual actions that need to be addressed. How blatantly abusive and spammy is this? At best, Google doesn't care a spit about it , but worst case is this is actively harming our SERP rankings. What's the best way to go about dealing with this? The site did have Analytics running, but the company lost the account information years ago, otherwise I'd check the numbers to see if we were ever hit by Panda/Penguin. I just got a new Analytics account implemented 2 weeks ago. Of course it's still using deprecated object values so I don't even know how accurate it is. Thanks everyone! qrPftlf.png0 -
Is pulling automated news feeds on my home page a bad thing?
I am in charge of a portal that relies on third-party content for its news feeds. the third-party in this case is a renowned news agency in the united kingdom. After the panda and penguin updates, will these feeds end up hurting my search engine rankings? FYI: these feeds occupy only 20 percent of content on my domain. The rest of the content is original.
White Hat / Black Hat SEO | | amit20760 -
404checker.com / crawl errors
I noticed a few strange crawl errors in a Google Webmaster Tools account - further investigation showed they're pages that don't exist linked from here: http://404checker.com/404-checker-log Basically that means anyone can enter a URL into the website and it'll get linked from that page, temporarily at least. As there are hundreds of links of varying quality - at the moment they range from a well known car manufacturer to a university, porn and various organ enlargement websites - could that have a detrimental effect on any websites linked? They are all nofollow. Why would they choose to list these URLs on their website? It has some useful tools and information but I don't see the point in the log page. I have used it myself to check HTTP statuses but may look elsewhere from now on.
White Hat / Black Hat SEO | | Alex-Harford0 -
Doorway Page? or just a flawed idea?
I have a website which is on a .co.uk TLD and is primarily focused to the UK. Understandably I get very little in the way on US traffic, even though a lot of the content is applicable to the UK or US and could be made more so with a little tinkering. The domain has some age to it and ranks quite well for a variety of keywords and phrases, so it seems sensible to keep the site on this domain. The .com version of the domain is no longer available, and the current owner does not seem inclined to sell it to me. So, I am considering registering a very similar .com domain and simply using it to drive some traffic to the .co.uk site. To do this, I would have the same category pages and the same (or similar) list of links to the various pages in those categories. But instead instead of linking to a page on the new .com, it would take visitors to the existing page on the .co.uk. I would make this transparent to visitors ("Take a look at these pages on our sister site bluewidgets.co.uk") and the .com would have some unique content of its own. Would this be considered some kind of Doorway site/page (content rich doorway), or is it simply bad idea which is unlikely to drive any traffic?
White Hat / Black Hat SEO | | Jingo010 -
Landing page for ppc
Is it okay to create a landing page with a different url to get additional traffic to my site with ppc? The purpose would not be for link building; I would only use it for direct marketing with ppc and people would click through to my main site via a no-follow link. Is there anything wrong with doing this?
White Hat / Black Hat SEO | | BradBorst0