A few misc Webmaster tools questions & Robots.txt etc
-
Hi
I have a few general misc questions re Robots.tx & GWT:
1) In the Robots.txt file what do the below lines block, internal search ?
Disallow: /?
Disallow: /*?2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ?
**3) **What's the best way to deal with the below:
- old removed page thats returning a 500 response code ?
- a soft 404 for an old removed page that has no current replacement
- old removed pages returning a 404
The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ?
Cheers
Dan
-
Many Thanks Stufroguk !!
-
-
It depends if Google had index these 'empty' pages. You need to check. Remember that every page is also give page authority. Best to redirect them before removing them as best practice. You can get Google to fetch the pages in GWTs so that the crawlers follow the redirect. Then remove them.
-
Your old pages - fetch them in GWT's, then remove them if you already have the 301's set up. Once google has indexed the new pages, you know the link juice has passed and can remove.
The blocking is used as a back up.
-
-
Thanks Stufroguk,
1) does this still apply if the pages had no content - they were just overview pages/folders without any copy, links or authority hence why i think its ok to just remove urls without 301'ing ?
2) i do have other old content pages that i have 301'd to new replacement but hadnt planned to do anything else with them, but your saying after 2 weeks should nofollow or block them ? wont that stop the link equity passing ?
Cheers
Dan
-
To manage old pages it's best practice to simply 301 redirect them, leave them for a couple of weeks then tag them with no follow and/or block them with robots. That way you've passed on the link equity. Then you can remove them from GWT's.
In answer to 1. yes But not all SE's read the "*" wildcard in file names. You might need to tinker with this a bit.
Use this to help:http://tool.motoricerca.info/robots-checker.phtml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
Webmaster Tools vs Screaming from for 404's
Hey guys, I was just wondering which is better to use to find the 404's effecting your site. I have been using webmaster tools and just purchased screaming frog which has given me a totally different list of 404's compared to WMT. Which do I use, or do I use both? Cheers
Technical SEO | | Adamshowbiz0 -
SEOMOZ and Webmaster Tools showing Different Page Index Results
I am promoting a jewelry e-commerce website. The website has about 600 pages and the SEOMOZ page index report shows this number. However, webmaster tools shows about 100,000 indexed pages. I have no idea why this is happening and I am sure this is hurting the page rankings in Google. Any ideas? Thanks, Guy
Technical SEO | | ciznerguy1 -
Can't find mistake in robots.txt
Hi all, we recently filled our robots.txt file to prevent some directories from crawling. Looks like: User-agent: * Disallow: /Views/ Disallow: /login/ Disallow: /routing/ Disallow: /Profiler/ Disallow: /LILLYPROFILER/ Disallow: /EventRweKompaktProfiler/ Disallow: /AccessIntProfiler/ Disallow: /KellyIntProfiler/ Disallow: /lilly/ now, as Google Webmaster Tools hasn't updated our robots.txt yet, I checked our robots.txt in some ckeckers. They tell me that the User agent: * contains an error. **Example:** **Line 1: Syntax error! Expected <field>:</field> <value></value> 1: User-agent: *** **`I checked other robots.txt written the same way --> they work,`** accordign to the checkers... **`Where the .... is the mistake???`** ```
Technical SEO | | accessKellyOCG0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0