Unsolved What would the exact text be for robots.txt to stop Moz crawling a subdomain?
-
I need Moz to stop crawling a subdomain of my site, and am just checking what the exact text should be in the file to do this.
I assume it would be:
User-agent: Moz
Disallow: /But just checking so I can tell the agency who will apply it, to avoid paying for their time with the incorrect text!
Many thanks.
-
To disallow Moz from crawling a specific subdomain, you would need to add a robots.txt file to the root directory of that subdomain with the following content:
User-agent: rogerbot
Disallow: /This will disallow Moz's web crawler, Rogerbot, from crawling any page or file within the subdomain. Keep in mind that this will only prevent Moz from crawling the subdomain - other search engines or bots may still be able to access it unless you add specific disallow rules for them as well.
-
@Simon-Plan No, when you put just slash / you will disallow everything.
Instead you need to put /foo/ where foo is your subdomain. Please see here for a reference to some relevant examples: https://searchfacts.com/robots-txt-allow-disallow-all/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why does Moz Crawler start with HTTP//1.1 version??
We have run Moz Pro's Site Crawl for example-domain.com Why has Moz's crawler decided to site http://www.example-domain.com (ie the non-secure version) as zero crawl depth page and report the actually live https://www.example-domain.com (ie secure HTTP2 version) as a crawl depth of 2? Surely the main live page should be the first crawled and reported as crawl depth = 0?
Moz Pro | | AKCAC0 -
How Can I influence the Google Selected Canonical
Our company recently rebranded and launched a new website. The website was developed by an overseas team and they created the test site on their subdomain. The only problem is that Google crawled and indexed their site and ours. I noticed Google indexed their sub domain ahead of our domain and based on Search Console it has deemed our content as the duplicate of theirs and the Google selected theirs as the canonical.
Community | | Spaziohouston
The website in question is https://www.spaziointerni.us
What would be the best course of action to get our content ranked and selected instead of being marked as the duplicate?
Not sure if I have to modify the content to make it more unique or have them submit a removal in their search console.
Our indexed pages continue to go down due to this issue.
Any help is greatly appreciated.1 -
Unsolved Moz crawler not crawling on my site
Hi all, im facing an issue where moz crawler is unable to crawl my site. The following error keeps showing Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. This is my robots.txt file : https://www.wearefutureheads.com/robots.txt I'm not sure what else am I missing.. can anyone help
Product Support | | teikh0 -
Dynamic Canonical Tag for Search Results Filtering Page
Hi everyone, I run a website in the travel industry where most users land on a location page (e.g. domain.com/product/location, before performing a search by selecting dates and times. This then takes them to a pre filtered dynamic search results page with options for their selected location on a separate URL (e.g. /book/results). The /book/results page can only be accessed on our website by performing a search, and URL's with search parameters from this page have never been indexed in the past. We work with some large partners who use our booking engine who have recently started linking to these pre filtered search results pages. This is not being done on a large scale and at present we only have a couple of hundred of these search results pages indexed. I could easily add a noindex or self-referencing canonical tag to the /book/results page to remove them, however it’s been suggested that adding a dynamic canonical tag to our pre filtered results pages pointing to the location page (based on the location information in the query string) could be beneficial for the SEO of our location pages. This makes sense as the partner websites that link to our /book/results page are very high authority and any way that this could be passed to our location pages (which are our most important in terms of rankings) sounds good, however I have a couple of concerns. • Is using a dynamic canonical tag in this way considered spammy / manipulative? • Whilst all the content that appears on the pre filtered /book/results page is present on the static location page where the search initiates and which the canonical tag would point to, it is presented differently and there is a lot more content on the static location page that isn’t present on the /book/results page. Is this likely to see the canonical tag being ignored / link equity not being passed as hoped, and are there greater risks to this that I should be worried about? I can’t find many examples of other sites where this has been implemented but the closest would probably be booking.com. https://www.booking.com/searchresults.it.html?label=gen173nr-1FCAEoggI46AdIM1gEaFCIAQGYARS4ARfIAQzYAQHoAQH4AQuIAgGoAgO4ArajrpcGwAIB0gIkYmUxYjNlZWMtYWQzMi00NWJmLTk5NTItNzY1MzljZTVhOTk02AIG4AIB&sid=d4030ebf4f04bb7ddcb2b04d1bade521&dest_id=-2601889&dest_type=city& Canonical points to https://www.booking.com/city/gb/london.it.html In our scenario however there is a greater difference between the content on both pages (and booking.com have a load of search results pages indexed which is not what we’re looking for) Would be great to get any feedback on this before I rule it out. Thanks!
Technical SEO | | GAnalytics1 -
Moz Site Crawl can't index WIX sites
We've been attempting to work on some SEO for a new potential client however they are using a WIX site. We've noticed that Moz SEO tools will not index any WIX sites. e.g. https://www.sharonradisch.com/ (which is one of their case studies). Anyone seen this that can offer any advice? Thanks,
Getting Started | | monkeex
Mark2 -
What is the best use for Moz tools specially keyword difficulty for startup ?
Hello, I'm so new in Moz and SEO world and i just started my website, a WordPress blog, I'm in a content creation period and i want to make it right from the beginning but I'm confused about how to use Moz tools in this period because i don't have content or traffic so no analytic as i think, so What is the best use of Moz tools in this period? About keyword difficulty tool i think this is the most tool i will use in the beginning, how i choose which keywords to use from my keywords list, in this time I'm depending on the on page SEO only, no backlinks no social engagements, which keywords to use to appear fast in search engines for a startup? less than "% difficulty " or between, I"m new in this word Please Moz and SEO experts give me a hand here. Note: I'm using Medium Moz pro plan.
Getting Started | | Romekio1 -
Why wont rogerbot crawl my page?
How can I find out why rogerbot won't crawl an individual page I give it to crawl for page-grader? Google, bing, yahoo all crawl pages just fine, but I put in one of the internal pages fo page-grader to check for keywords and it gave me an F -- it isn't crawling the page because the keyword IS in the title and it says it isn't. How do I diagnose the problem?
Getting Started | | friendoffood0 -
Can I use wildcards "*" when setting up a new Moz campaign?
Basically I would like the Moz crawler to focus on a specific section of our domain. We do not bucket things via folder groups, so the use of wildcards would be applicable to us. Our URL structure: www.domain.com/some-stuff-here/p12345 Is the example below a valid input to track the above URL structure? www.domain.com//p Thanks.
Getting Started | | WEB-IRS0