Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt Syntax for Dynamic URLs
-
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page=
Which is the proper syntax?
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else? -
Thanks, Alick300 — unfortunately, the slash doesn't appear like that in the URLs on this site: they look like this
www.domain.com/page.html?Page= .........In running through an online robots.txt tester, all three versions in my original question seem to work. Until proven otherwise, I'm using the first one because it's the simplest.
-
Hi Bill,
Disallow: /?Page= will work
Thanks
-
Hi, James. It's not pagination I'm trying to disallow. The site structure has URLs that include things like "Page=give&...", that opens up a blank form ... but it comes from scores of web pages we want to spider. Since the "give" page is an empty form, we're getting tons of duplicate content errors as a result.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Dynamically Inserting Noindex With Javascript
Hello, I have a broken plugin creating hundreds of WP-Content directory pages being indexed by Google. I can not access the source code of these pages to add a noindex to them. The page URL's all have the plugin name within them. In order to resolve the issue, I wrote a solution with javascript to dynamically add in a noindex tag to any URL containing the plugin name. Would this noindex be respected by Google and is there a way to immediately check that it is respected? Currently, I can not delete the plugin due to issues with it's php. If you would like to view the code: https://codepen.io/trodrick/pen/Gwwaej?editors=0010 Thanks!
Technical SEO | | Tom3_150 -
How do I customize Magento product urls?
I would like my product urls to be /category/manufacturer/name/part#. This would be the only url the item uses and how the product is accessed. It would also be used for product feeds. My first attempt was to use https://amasty.com/magento-unique-product-url.html This creates a single url but I can not customize it. Sometimes it selects the manufacturer and sometimes the category. My second attempt was with https://www.magentocommerce.com/magento-connect/custom-product-urls-seo.html I have it installed but it doesn't change the urls. Has anyone been able to do this successfully?
Technical SEO | | Tylerj0 -
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050