How to disallow google and roger?
-
Hey Guys and girls,
i have a question, i want to disallow all robots from accessing a certain root link:
Get rid of bots
User-agent: *
Disallow: /index.php?_a=login&redir=/index.php?_a=tellafriend%26productId=*
Will this make the bots not to access any web link that has the prefix you see before the asterisk? And at least google and roger will get away by reading "user-agent: *"? I know this isn't the standard proceedure but if it works for google and seomoz bot we are good.
-
In short, yes that will work.
To be clear, you can have multiple links which all lead to the same target page. The target of that link could still be indexed if there are other unblocked links which point to it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking Google from telemetry requests
At Magnet.me we track the items people are viewing in order to optimize our recommendations. As such we fire POST requests back to our backends every few seconds when enough user initiated actions have happened (think about scrolling for example). In order to eliminate bots from distorting statistics we ignore their values serverside. Based on some internal logging, we see that Googlebot is also performing these POST requests in its javascript crawling. In a 7 day period, that amounts to around 800k POST requests. As we are ignoring that data anyhow, and it is quite a number, we considered reducing this for bots. Though, we had several questions about this:
Technical SEO | | rogier_slag
1. Do these requests count towards crawl budgets?
2. If they do, and we'd want to prevent this from happening: what would be the preferred option? Either preventing the request in the frontend code, or blocking the request using a robots.txt line? The latter question is given by the fact that a in-app block for the request could lead to different behaviour for users and bots, and may be Google could penalize that as cloaking. The latter is slightly less convenient from a development perspective, as all logic is spread throughout the application. I'm aware one should not cloak, or makes pages appear differently to search engine crawlers. However these requests do not change anything in the pages behaviour, and purely send some anonymous data so we can improve future recommendations.0 -
Should I disallow crawl of my Job board?
MOZ crawler is telling me we have loads of duplicate content issues. We use a Job Board plugin on our Wordpress site and we have allot of duplicate or very similar jobs (usually just a different location), but the plugin doesn't allow us to add any rel canonical tags to the individual jobs. Should I disallow the /jobs/ url in the robots.txt file? This will solve the duplicate content issue but then Google wont be able to crawl any of the individual job listings Has anyone had any experience working with a job board plugin on Wordpress and had a similar issue, or can advise on how best to solve our duplicate content?? Thanks 🙂
Technical SEO | | O2C0 -
Domain not ranking in Google
https://www.buitenspeelgoed.nl/ is a domain acquired by our client. Previously this website was on http://www.buitenspeelgoed-keupink.nl. With the old domain they were ranking top 30 on 'buitenspeelgoed' in google.nl. Now with the new exact match domain they aren't ranking any more (for months). However, the website is indexed, as you can see on http://1l1.be/nz I don't know what to do anymore. Need some advise. What we allready have done the last months: made adjustments to the 301-redirects (this was originaly setup wrong by the webdesigner (de) optimized the homepage on 'buitenspeelgoed' (strange is the fact that the Moz robot can't access the site). Checked the robots.txt to see if the website was blocked for Google Checked the meta robots to see if the website was blocked for Google Disavowed some spammy (old) links which linked to the old domain Checked Search console > Fetch as Google if there isn't any Malware of some kind (and to see if Google can access the site) Checked Search consol to see if there manual spam actions (isn't the case) Checked for duplicate content by copy/paste some texts in Google and see if any other results are showing up (isn't the case for most of the texts) Please let me know what we can do.
Technical SEO | | InventusOnline0 -
Switching from HTTP to HTTPS and google webmaster
HI, I've recently moved one of my sites www.thegoldregister.co.uk to https. I'm using wordpress and put in the permanent 301 redirect for all pages to false https for all pages in the htaaccess file. I've updated the settings in google analytics to https for the original site. All seems to be working well. Regarding the google webmaster tools and what needs to be done. I'm very confused by the google documentation on this subject around https. Does all my crawl data and indexing from http site still stand and be inherited by the https version because of the redirects in place. I'm really worried I will lose all of this indexing data, I looked at the "change of address" in the settings of webmaster, but this seems to refer to changing the actual domain name rather than the protocol which i haven't at all. I've also tried adding the https version to the console as well, but the https version is showing a severe warning "is robots.txt blocking some important pages". I don't understand this error as it's the same version and file as the http site being generated by all in one seo pack for wordpress (see below at bottom). The warning is against line 5 saying it will ignore it. What i don't understand is i don't get this error in the webmaster console with the http version which is the same file?? Any help and advice would be much appreciated. Kind regards Steve User-agent: *
Technical SEO | | lqz
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Crawl-delay: 10 ceLAHIv.jpg0 -
Example of Google Indexing my Feedburner Links
As you can see, there are 2 results for the same page. One is the correct page URL, the other has the Feedburner parameters at the end: http://www.thewebhostinghero.com/articles/improving-user-engagement-with-the-right-blog-commenting-system.html http://www.thewebhostinghero.com/articles/improving-user-engagement-with-the-right-blog-commenting-system.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Can this cause duplicate content issues? Can I prevent Google from indexing my Feedburner links? My Feedburner settings are already set to noindex, what else can I do?!? 22cfThX.png
Technical SEO | | sbrault740 -
About google Disavow tool
My website is attacked by spammed link method, so should i use Goolge disavow tool to remove that links? And i have an question that when i use google Disavow to remove backlinks, but i still not remove it on the webpage that placed my links. Does Google index that backlink again? or never?
Technical SEO | | magician0 -
Google Places Reviews
Has anyone had any delays on Google+ reviews to show up? We have multiple clients who have not received a new review in over two months. These are good accounts with good Zagat scores with 15+ good reviews from real customers. Our clients have asked their clients and have confirmed that there has been reviews left recently. However no new reviews have shown up in the past 60+ days.
Technical SEO | | CaseyKluver0 -
0 Google Backlinks
A sudden drop in the number of google backlinks. Earlier this month I had 15 google backlinks and now all of a sudden I have none. My google impression has also dropped drastically, my website's average is 10000 impression per day and now we have none. I have increased the crawler's speed on the website, would this be the cause of it? 4pmdesign.com
Technical SEO | | 4pm0