Robots.txt file - How to block thosands of pages when you don't have a folder path
-
Hello.
Just wondering if anyone has come across this and can tell me if it worked or not.Goal:
To block review pagesChallenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only.
Question:
If I add the following to the Robots.txt file will it block all review pages?User-agent: *
Disallow: /default.aspx?z=reviewMuch thanks,
Davinia -
Also remember that blocking in robots.txt doesn't prevent Google from indexing those URLs. If the URLs are already indexed or if they are linked to, either internally or externally they may still in appear in the index with limited snippet information. If so, you'll need to add a noindex meta tag to those pages.
-
An * added to the end! Great thank you!
-
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Head down to the pattern matching section.
I think
User-agent: *
Disallow: /default.aspx?z=review*should do the trick though.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Magento 1.9 SEO. I have product pages with identical On Page SEO score in the 90's. Some pull up Google page 1 some won't pull up at all. I am searching for the exact title on that page.
I have a website built on Magento 1.9. There are approximately 290,000 part numbers on the site. I am sampling Google SERP results. About 20% of the keywords show up on page 1 position 5 thru 10. 80% don't show up at all. When I do a MOZ page score I get high 80's to 90's. A page score of 89 on one part # may show up on page one, An identical page score on a different part # can't be found on Google. I am searching for the exact part # in the page title. Any thoughts on what may be going on? This seems to me like a Magento SEO issue.
Intermediate & Advanced SEO | | CTOPDS0 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
Weird rankings on my website, can't figure it out
Hey guys, One of my post popular pages for "Rust Hacks" use to be - http://www.ilikecheats.com/01/rust-cheats-hacks-aimbot/ Now when searching Google for site:ilikecheats.com rust hacks This page shows as the highest ranking - http://forum.ilikecheats.com/forums/221-Rust-Hacks-Rust-Cheats-Public-Forum What's weird is it seems the entire front end (Wordpress site) isn't ranking well anymore on page #1 of Google and the forums are ranking better currently. I did have a huge penalty from backlinks last year but cleared it. I got Yoast to do a site review and I'm cleaning up everything now. I also cleared most of the bad links via the disavow tool. Another example is when I search for "warz hacks" the forums show up in 4th place but the main website isn't showing at all back to page 10. If I search site:ilikecheats.com warz hacks the links directly to the main site doesn't show until page #2. So is this still a penalty that is carried over or is something else going on? Can't seem to figure it out, thanks in advance for looking. 😃 Any ideas what's going on and why the main pages no longer rank - http://www.ilikecheats.com
Intermediate & Advanced SEO | | Draden670 -
What NAP format do I use if the USPS can't even find my client's address?
My client has a site already listed on Google+Local under "5208 N 1st St". He has some other NAPs, e.g., YellowPages, under "5208 N First Street". The USPS finds neither of these, nor any variation that I can possibly think of! Which is better? Do I just take the one that Google has accepted and make all the others like it as best I can? And doesn't it matter that the USPS doesn't even recognize the thing? Or no? Local SEO wizards, thanks in advance for your guidance!
Intermediate & Advanced SEO | | rayvensoft0 -
Disallow my store in robots.txt?
Should I disallow my store directory in robots.txt? Here is the URL: https://www.stdtime.com/store/ Here are my reasons for suggesting this: SEOMOZ finds crawl "errors" in there that I don't care about I don't think I care if the search engines index those pages I only have one product, and it is not an impulse buy My product has a 60 day sales cycle, so price is less important than features
Intermediate & Advanced SEO | | raywhite0 -
Most Painless way of getting Duff Pages out of SE's Index
Hi, I've had a few issues that have been caused by our developers on our website. Basically we have a pretty complex method of automatically generating URL's and web pages on our website, and they have stuffed up the URL's at some point and managed to get 10's of thousands of duff URL's and pages indexed by the search engines. I've now got to get these pages out of the SE's indexes as painlessly as possible as I think they are causing a Panda penalty. All these URL's have an addition directory level in them called "home" which should not be there, so I have: www.mysite.com/home/page123 instead of the correct URL www.mysite.com/page123 All these are totally duff URL's with no links going to them, so I'm gaining nothing by 301 redirects, so I was wondering if there was a more painless less risky way of getting them all out the indexes (IE after the stuff up by our developers in the first place I'm wary of letting them loose on 301 redirects incase they cause another issue!) Thanks
Intermediate & Advanced SEO | | James770 -
Is 404'ing a page enough to remove it from Google's index?
We set some pages to 404 status about 7 months ago, but they are still showing in Google's index (as 404's). Is there anything else I need to do to remove these?
Intermediate & Advanced SEO | | nicole.healthline0 -
Do inlinks to a 302'd page revert to the original when the 302 ends?
I have a genuine reason for temporarily moving some pages from my site onto a subdomain for a few weeks. If I do this using a 302, what happens to any in-links to the destination page once the the 302 reverses back to the original URL? Is any Google juice from those links lost?
Intermediate & Advanced SEO | | StuartAnderton1