Ben,
I doubt that crawlers are going to access the robots.txt file for each request, but they still have to validate any url they find against the list of the blocked ones.
Glad to help,
Don
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Ben,
I doubt that crawlers are going to access the robots.txt file for each request, but they still have to validate any url they find against the list of the blocked ones.
Glad to help,
Don
Hi Bob,
About the nofollow vs blocked. In the end I suppose you have the same results, but in practice it works a little differently. When you nofollow a link it tells the crawler as soon as it encounters the link not to request or follow that link path. When you block it via robots the crawler still attempts to access the url only to find it not accessible.
Imagine if I said go to the parking lot and collect all the loose change in all the unlocked cars. Now imagine how much easier that task would be if all the locked cars had a sign in the window that said "Locked", you could easily ignore the locked cars and go directly to the unlocked ones. Without the sign you would have to physically go check each car to see if it will open.
About link juice, if you have a link, juice will be passed regardless of the type of link. (You used to be able to use nofollow to preserve link juice but no longer). This is bit unfortunate for sites that use search filters because they are such a valuable tool for the users.
Don
Hi Bob,
You can "suggest" a crawl rate to Google by logging into your webmasters tools on Google and adjusting it there.
As for indexing pages.. I looked at your robots and site. It really looks like you need to employ some No Follow on some of your internal linking, specifically on the product page filters, that alone could reduce the total number of URLS that the crawlers even attempts to look at.
Additionally your sitemap http://premium-hookahs.nl/sitemap.xml shows a change frequency of daily, and probably should be broken out between Pages / Images so you end up using two sitemaps one for images and one for pages. You may also want to review what is in there. Using ScreamingFrog (free) the sitemap I made (link) only shows about 100 urls.
Hope it helps,
Don
Hi Will,
I'm of 2 minds when it comes to directories. My general advice would be to ignore them all together, unless there are some very industry specific ones that make sense. I say general advice because the vast majority of industries I have researched have only 1 known good directory (Dmoz.org), the rest are at best, relic sites that have basically run their course in usefulness and give little to no value in terms of traffic or link juice. Why? because it is atypical for somebody to use anything other then Google / Yahoo / Bing / Baidu to find anything on the internet.
That being said, I do place some value on directories for some specific industries and lead generation. For example, in my current industry there is a site that has been around since the 90's and many people before the rise of search engine dominance found it as a great resource for finding business to business partnerships. Many of those people who got acclimated to the site are still working today and use it as their go to source for specific project requirements. In other words they have used it for so long and it has worked for so long they never found the need to branch out and rely on search engines. And in all honesty even Google would have a hard time returning pertinent results for lets say a rubber manufacturer who has experience with over molding fda approved buna-n rubber to an aluminum substrate. But the good directory sites can list those sorts of capabilities.
Because this is a public question I had to give both my opinions on directory sites. Again I wouldn't seek them out as any form of link building, but I also wouldn't ignore ones that seem capable of delivering either traffic or leads, I will say with the exception of Dmoz.org any of the good directories sites I have run across are very industry specific and they are certainly not free.
Hope that helps
Don
Hello Bob,
Here is some food for thought. If you disallow a page in Robots.txt, google for example will not crawl that page. That does not however mean they will remove it from the index if it had previously been crawled. It simply treats it as inaccessible and moves on. It will take some time, months before Google finally says, we have no fresh crawls of page x, its time to remove it from the index.
On the other hand if you specifically allow Google to crawl those pages and show a no-index tag on it, Google now has a new directive it can act upon immediately.
So my evaluation of the situation would be to do 1 of 2 things.
1. Remove the disallow from robots and allow Google to crawl the pages again. However, this time use no-index, no-follow tags.
2. Remove the disallow from robots and allow Google to crawl the pages again, but use canonical tags to the main "filter" page to prevent further indexing the specific filter pages.
Which option is best depends on the amount of urls being indexed, a few thousand canonical would be my choice. A few hundred thousand, then no index would make more sense.
Whichever option, you will have to insure Google re-crawls, and then allow them time to re-index appropriately. Not a quick fix, but a fix none the less.
My thoughts and I hope it makes sense,
Don
Hi James,
For page load, network, speed test I have used Pingdom.com in the past. They recently went more pay to use but it is a nice set of tools for basic test. There is still some free stuff you can use at tools.pingdom.com
For a quick SEO pass. Man I love ScreamingFrog! You can quickly identify header errors, long titles, descriptions, lack of h1 tags and so much more. When I do general quick audits for trouble shooting problems posted on this board, its my go to.
Hope this helps,
Don
Hi,
You may also want to check the domain variations. http://penn-criminallawyers.com/ and http://www.penn-criminallawyers.com/
If you look at the www version you will see a spam score of 4/10. You'll need to make sure you set the filter to "this root domain"
It could be that at some point the website used the www but have since switched to no www. There are some inconsistency like that with the tool. Technically "www" and "no www" are 2 different domains, in practicality we use them interchangeably.
Hope this helps,
Don
Hi Netkernz_ag,
It is just good practice to have those types of pages available. While I wouldn't say it is an absolute requirement, it should be something you do for your users. The page you pointed to is a general checklist of things to do, and not to do for your users. Creating a Site Index maybe a bit dated, but I still tend to do them as they are fairly easy to create. (example).
Hope this helps,
Don