Robot.txt File Not Appearing, but seems to be working?
-
Hi Mozzers,
I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank.
Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks!
Jared
-
There is an old webmaster world thread that explains how to hide the robots.txt file from browsers.... not sure why one would do this however....
http://www.webmasterworld.com/forum93/74.htm
Perhaps they are doing something like this?
-
I verified that I was checking /robots.txt. I had trouble verifying if it was under the non-www because everything redirects to the www. I also checked to see if it was being blocked, and it is not.
I went to Archive.org (Wayback Machine), and I can see the robot.txt file in previous versions of the site. I cannot, however, view it online, even though Google says they are downloading it successfully, and the robots.txt file is successfully blocking URLs from the search index.
-
Be sure you are visiting /robots.txt In all of your copy above, you are referencing robot.txt
Also, check to see if it possibly is only showing up on the www. version or the site or the non-www version of the site.
To be sure if it's working, you can test URLs of your website within Google Webmaster Tools. Go to Crawl->Blocked URLs and scroll down to the bottom.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do CTR manipulation services actually work to improve rankings?
I've seen a variety of services on the fringe of the SEO world that send a flow of (fake) traffic to your website via Google, to drive up your SERP CTR and site engagement. Seems gray hat, but I'm curious as to whether it actually works. The latest data I've seen from trustworthy sources (example and example 2) seems mixed on whether CTR has a direct impact on search rankings. Google claims it doesn't. I think it's possible it directly impacts rankings, or its possible Google is using some other metric to reward high engagement pages and CTR correlates with that. Any insight on whether CTR manipulation services actually work?
Intermediate & Advanced SEO | | AdamThompson1 -
URLs with parameters + canonicals + meta robots
Hi Moz community! I'm posting a new question here as I couldn't find specific answer to the case I'm facing. Along with canonical tags, we are implementing meta robots on our pages (e-commerce website with thousands of pages). Most of the cases have been covered but I still have one unanswered case: our products are linked from list pages (mostly categories) but they almost always include a tracking parameter (ie /my-product.html?ref=xxx) products urls are secured with a canonical tag (referring only to the clean url /my-product.html) but what would be the best solution regarding the meta robots? For now we opted for a meta robot 'noindex, follow' for non canonical urls (so the ones unfortunately linked from our category/list pages), but I'm afraid that it could hurt our SEO (apparently no juice is given from URLs with a noindex robots), and even maybe prevent bots from crawling our website properly ... Would it be best to have no meta robots at all on these product urls with parameters? (we obviously can't have 'index, follow' when the canonical ref points to another url!). Thanks for your help!
Intermediate & Advanced SEO | | JessicaZylberberg0 -
Same content on other domain owned by de company. Canonical is not working
Hi! I am analyzing a website right now. It's a school, let's name it NEWSCHOOL. This school is owned by other school, let's name it, BIGSCHOOL NEWSCHOOL is specialized in tourism degrees, and the BIGSCHOOL is a bigger and older one with a lot of different degrees. What happens is that NEWSCHOOL has a course, let's name it TOURISM DEGREE.
Intermediate & Advanced SEO | | teconsite
BIGSCHOOL has that course too, with the same content, trying to help to promote the content, because this school is older, well known and has a consolidated brand internationally. BIGSCHOOL, has placed a canonical tag, telling Google that content comes from NEWSCHOOL. What is happening is that the result of newschool is beeing omited by google. The first result is the BIGSCHOOL content, and then a lot of training portals, where the degree content is too to increase its visibility. So, I would like to know, how can we do to say google that the content that it should show is the one of NEWSCHOOL and not the one in BIGSCHOOL. It's pretty clear that Google knows that those portals are closed related, because it is omitting the NEWSCHOOL results. I know that we can send a link from the content area from one portal to the other in the content we want. But... would it solve the problem... and y we have to repeat that for each degree, woudn't it be a little dangerous? Would like to know your points of view! Thanks!0 -
Does this work as a tactic for including keyword in URL structure
Howdy, I'm planning out a website and need to plan out the URL structure for best SEO value. Generally I would do something like this:
Intermediate & Advanced SEO | | IrvCo_Interactive
site.com/widgetssite.com/widgets/large
site.com/widgets/large/blue
etc. I think this is a pretty straight forward SEO tactic. The issue I have with it is in terms of natural language the "thing" you are searching for in this case is a widget, so typically you would type/search [adjective] [noun], or in this case "large blue widgets." So one proposal I have is to instead append the "widget" to the end of the URL:
site.com/large-widgets
site.com/large/blue-widgets
site.com/large/blue/square-widgets
etc. Obviously this breaks the whole silo concept since the square-widgets page is inside the /blue directory but the blue widgets page isn't at /blue it is /blue-widgets. My solution is to setup 301 redirects from /blue to /blue-widgets (even thought there are no site links pointing to that page). Does this seem like a good idea? Or does this break the whole folder silo concept? What I like about it is that it feels more user friendly in terms of natural language and for certain high value keywords we can get certain pairings of words into the URL more like how a person would type them in.0 -
Is Google indexing Mp3 audio and MIDI music files? Can that cause any duplicate problems?
Hello, I own virtualsheetmusic.com website and we have several thousands of media files (Mp3 and MIDI files) that potentially Google can index. If that's the case, I am wondering if that could cause any "duplicate" issues of some sort since many of such media files have exact file names or same meta information inside. Any thoughts about this issue are very welcome! Thank you in advance to anyone.
Intermediate & Advanced SEO | | fablau0 -
Whole site blocked by robots in webmaster tools
My URL is: www.wheretobuybeauty.com.auThis new site has been re-crawled over last 2 weeks, and in webmaster tools index status the following is displayed:Indexed 50,000 pagesblocked by robots 69,000Search query 'site:wheretobuybeauty.com.au' returns 55,000 pagesHowever, all pages in the site do appear to be blocked and over the 2 weeks, the google search query site traffic declined from significant to zero (proving this is in fact the case ).This is a Linux php site and has the following: 55,000 URLs in sitemap.xml submitted successfully to webmaster toolsrobots.txt file existed but did not have any entries to allow or disallow URLs - today I have removed robots.txt file completely URL re-direction within Linux .htaccess file - there are many rows within this complex set of re-directions. Developer has double checked this file and found that it is valid.I have read everything that google and other sources have on this topic and this does not help. Also checked webmaster crawl errors, crawl stats, malware and there is no problem there related to this issue.Is this a duplicate content issue - this is a price comparison site where approx half the products have duplicate product descriptions - duplicated because they are obtained from the suppliers through an XML data file. The suppliers have the descriptions from the files in their own sites.Help!!
Intermediate & Advanced SEO | | rrogers0 -
What tactics are working well for seo these days?
It seems google put the scare in everyone and all hear is content marketing is the future etc But few talk about what tactics are working to rank a site on a compettive term now not in the future. So ask from your experiences what tactics do you see working the best these days?
Intermediate & Advanced SEO | | DavidKonigsberg0 -
1200 pages no followed and blocked by robots on my site. Is that normal?
Hi, I've got a bunch of notices saying almost 1200 pages are no-followed and blocked by robots. They appear to be comments and other random pages. Not the actual domain and static content pages. Still seems a little odd. The site is www.jobshadow.com. Any idea why I'd have all these notices? Thanks!
Intermediate & Advanced SEO | | astahl110