Will a Robots.txt 'disallow' of a directory, keep Google from seeing 301 redirects for pages/files within the directory?
-
Hi- I have a client that had thousands of dynamic php pages indexed by Google that shouldn't have been. He has since blocked these php pages via robots.txt disallow. Unfortunately, many of those php pages were linked to by high quality sites mulitiple times (instead of the static urls) before he put up the php 'disallow'.
If we create 301 redirects for some of these php URLs that area still showing high value backlinks and send them to the correct static URLs, will Google even see these 301 redirects and pass link value to the proper static URLs? Or will the robots.txt keep Google away and we lose all these high quality backlinks? I guess the same question applies if we use the canonical tag instead of the 301. Will the robots.txt keep Google from seeing the canonical tags on the php pages?
Thanks very much,
V
-
No problem
-
Hello Dmitrii,
Yes, that clarifies things perfectly. Thanks very much for your explanation. And I missed this particular WBF, so I will give it a close look as well.
Thanks again for your quick help.
-
Hello, my friend.
You should realize how exactly htaccess' 301 redirects work. They are server side commands/operations. So, when bots request a page, they wait until server response. In case of 301s - they get response "Don't go here, go there". Now, they also may get response from robots.txt saying "you're not allowed to look at the contents of this file/directory", however this will not prevent the server response. That's why sometimes you can see indexed pages, which are saying "blocked by robots". They are indexed though.
Now, in case of canonical links you are correct, since canonical is IN the content of the page, then robots won't be able to read it, therefore won't be able to be told that there is a canonical page.
There is a recent WBF on this subject - https://moz.com/blog/controlling-search-engine-crawlers-for-better-indexation-and-rankings-whiteboard-friday
Hope this clarifies some things.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Redirects Showing As 307 Redirects
Hi, Our clients are adamant that they have set up 301 permanent redirects on their websites, but when we check using Screaming Frog and various online HTTP status code checkers they are showing as 307 temporary redirects. Examples;
Technical SEO | | Webpresence
http://www.lifestylelifts.co.uk/home-lifts/
http://www.terrylifts.co.uk/ Again, the client says they are seeing 301 redirects. Why are we seeing 307's? Who is right? Very puzzling, any theories would be very much appreciated 🙂 Thanks in advance. Lee.0 -
Will redirecting a logged in user from a public page to an equivalent private page (not visible to google) impact SEO?
Hi, We have public pages that can obviously be visited by our registered members. When they visit these public pages + they are logged in to our site, we want to redirect them to the equivalent (richer) page on the private site e.g. a logged in user visiting /public/contentA will be redirected to /private/contentA Note: Our /public pages are indexed by Google whereas /private pages are excluded. a) will this affect our SEO? b) if not, is 302 the best http status code to use? Cheers
Technical SEO | | bernienabo0 -
404 Errors for Form Generated Pages - No index, no follow or 301 redirect
Hi there I wonder if someone can help me out and provide the best solution for a problem with form generated pages. I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404. Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy. Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed. Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible? The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC. I really appreciate any feedback on this one. Many thanks
Technical SEO | | Ric_McHale0 -
'sameAs' Mark up for different spellings of a Product/Keyword, is it possible?
Hi There, I've seen that for social media profiles you can mark them up to be the 'sameAs', example below: - <code><scripttype="application ld+json"="">{ "@context":"http://schema.org", "@type":"Organization", "name":"Your Organization Name", "url":"http://www.your-site.com", "sameAs":[ "http://www.facebook.com/your-profile", "http://www.twitter.com/yourProfile", "http://plus.google.com/your_profile" ] }</scripttype="application></code> My question is can you do something similar for your product/keyword? For example when you can spell the word in different ways e.g. Whisky (English) or Whiskey (Irish/US). I've had a look at schema.org but I'm not sure if I'm headed down the wrong path? Thanks
Technical SEO | | Jon-S0 -
Are image pages considered 'thin' content pages?
I am currently doing a site audit. The total number of pages on the website are around 400... 187 of them are image pages and coming up as 'zero' word count in Screaming Frog report. I needed to know if they will be considered 'thin' content by search engines? Should I include them as an issue? An answer would be most appreciated.
Technical SEO | | MTalhaImtiaz0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Blocked by meta-robots but there is no robots file
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
Technical SEO | | Twinbytes0 -
Do 301 redirects pass page rank quickly
Hi I have been asked to carry out a site audit for a potential client. The site has that many issues I don't where to start in explaining them however, there is one question we are debating and would like to get a second opinion on it. The site I am auditing used to have a homepage rank 7. The site has currently had a redesign (new template with new URLs) and now the root domain 301 redirects to a sub folder two levels deep (not ideal I know!). This happened about a month ago and we are still getting N/A for toolbar page rank. The question is, does Google page rank transfer quicker than normal due to the redirects? or do we still have to wait on the next Google Page Rank update? Thanks in advance Gavelect
Technical SEO | | Equatorites0