Having issues crawling a website
-
We looked to use the Screaming Frog Tool to crawl this website and get a list of all meta-titles from the site, however, it only resulted with the one result - the homepage.
We then sought to obtain a list of the URLs of the site by creating a sitemap using https://www.xml-sitemaps.com/. Once again however, we just go the one result - the homepage.
There is something that seems to be restricting these tools from crawling all pages. If you anyone can shed some light as to what this could be, we'd be most appreciative.
-
That robots.txt should be fine.. its not blocking anything.
The reason the crawl is stopping on the homepage is this code:
<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">nofollow</a>">
Which tells bots to not follow any links on the page. Remove that and you should be good.
-
Hi,
I think it is your robots.txt file that is causing the issue. At the moment you have the following:
**User-agent: ***
Disallow:
I would recommend updating it to the following:
**User-agent: ***
Allow: /
Moz also has a good post about what else you can include in your robots.txt file for best practices etc. :
https://moz.com/learn/seo/robotstxt
Hope that helps
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Interest in optimise Google Crawl
Hello, I have an ecommerce site with all pages crawled and indexed by Google. But I have some pages with multiple urls like : www.sitename.com/product-name.html and www.sitename.com/category/product-name.html There is a canonical on all these pages linking to the simplest url (so Google index only one page). So the multiple pages are not indexed, but Google still comes crawling them. My question is : Did I have any interest in avoiding Google to crawl these pages or not ? My point is that Google crawl around 1500 pages a day on my site, but there are only 800 real pages and they are all indexed on Google. There is no particular issue, so is it interesting to make it change ? Thanks
Intermediate & Advanced SEO | | onibi290 -
Redirect Issue in .htaccess
Hi, I'm stumped on this, so I'm hoping someone can help. I have a Wordpress site that I migrated to https about a year ago. Shortly after I added some code to my .htaccess file. My intention was to force https and www to all pages. I did see a moderate decline in rankings around the same time, so I feel the code may be wrong. Also, when I run the domain through Open Site Explorer all of the internal links are showing 301 redirects. The code I'm using is below. Thank you in advance for your help! Redirect HTTP to HTTPS RewriteEngine On ensure www. RewriteCond %{HTTP_HOST} !^www. [NC]
Intermediate & Advanced SEO | | JohnWeb12
RewriteRule ^ https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301] ensure https RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteCond %{HTTPS} off
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301] BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress USER IP BANNING <limit get="" post="">order allow,deny
deny from 213.238.175.29
deny from 66.249.69.54
allow from all</limit> #Enable gzip compression
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript #Setting heading expires
<ifmodule mod_expires.c="">ExpiresActive on
ExpiresDefault "access plus 1 month"
ExpiresByType application/javascript "access plus 1 year"
ExpiresByType image/x-ico "access plus 1 year"
ExpiresByType image/jpg "access plus 14 days"
ExpiresByType image/jpeg "access plus 14 days"
ExpiresByType image/gif "access plus 14 days"
ExpiresByType image/png "access plus 14 days"
ExpiresByType text/css "access plus 14 days"</ifmodule>0 -
Issue with site not being properly found in Google
We have a website [domain name removed] that is not being properly found in Google. When we run it through Screaming Frog, it indicates that there is a problem with the robot.txt file. However, I am unsure exactly what this problem is, and why this site is no longer properly being found. Any help here on how to resolve this would be appreciated!
Intermediate & Advanced SEO | | Gavo1 -
Our website is not being indexed
We have an issue with a site that we can't get to the bottom of. This site: (URL removed) is not being properly indexed. When we do a search for (URL removed) in google.com.au. The site appears as the 4th listing with the following title and description: (Title removed) A description for this result is not available because of this site's robots.txt – learn more. We have checked the site's robots.txt and can see its been now implemented correctly: (URL removed) About a week ago, we also went into Webmaster Tools and submitted a request for Google to recrawl our site. We are unsure what the issue is that is causing the site to not be properly indexed and how to resolve it. Any assistance on this topic would be most appreciated!
Intermediate & Advanced SEO | | Gavo0 -
Same website, seperate subfolders or separete websites? 12 stores in two cities
I have a situation where there are 12 stores in separate suburbs across two cities. Currently the chain store has one eCommerce website. So I could keep the one website with all the attendant link building benefits of one domain. I would keep a separate webpage for each store with address details to assist with some Local SEO. But (1) each store has slightly different inventory and (2) I would like to garner the (Local) SEO benefits of being in a searchers suburb. So I'm wondering if I should go down the subfolder route with each store having its own eCommerce store and blog eg example.com/suburb? This is sort of what Apple does (albeit with countries) and is used as a best practice for international SEO (according to a moz seminar I watched awhile back). Or I could go down the separate eCommerce website domain track? However I feel that is too much effort for not much extra return. Any thoughts? Thanks, Bruce.
Intermediate & Advanced SEO | | BruceMcG0 -
Question about multiple websites in same field
I know what most people say that it is best to only have the 1 website for focus but if we can put this to the back of our minds, if we create 2 different websites that are totally different designs (one upmarket one and one targeting the cheaper market) but in the same fields (printing) and go after 80% of the same keywords is this ok (could we be penalized). Please note we will not be interlinking the websites, the website .will be on different servers and the names will be registered under different people (2 partners in the business). We will however be accessing webmaster tools from the same location.
Intermediate & Advanced SEO | | BobAnderson0 -
301 Redirect Dilemma - Website redesign
Hi Guys, We are redesigning a clients ecommerce site. As part of the process, we're changing the URL structure to make it more friendly. I have put together a provisional 301 redirect plan but I'm not sure just how far I need to go with it. So far I have extract all the pages from the existing site that Google Webmaster Tools says have links pointing at them - this totals 93 pages. I have matched each page like for like to the new website structure. My next step was to pull the landing pages report from Google Analytics, I have extracted the pages that received entrances over the last 6 weeks. This totals 553, less the redirects I have already done and cleaning up some Google Translate pages I have circa 410 pages left. Many of these pages has more than 1 URL pointing to that page. I'm debating how important it is that that all of these remaining 410 pages have individual redirects set up for them one by one. I have to rule out regex because there is no pattern that makes sense given that I have already set up redirects for the first 93 pages that have external links. My question therefore is how important are 301 redirects on pages that have no external links and receive less than 10 entrances over a 6 week previous period? Do I need to 301 every single product on the old site to it's corresponding page on the new site? Also, I'm not sure how to treat pages that have mutliple URL's on the old site, the existing URL structure is such a mess that in some instances I have 5 URL's for one product page? I could feasibly create 5 seperate redirects but is this necessary? Also what about speed considerations, the server is going to have to load these redirects and it may slow the site down. I'm sitting at 100 odd so far. Any answers are most appreciated. Thanks Derek.
Intermediate & Advanced SEO | | pulseo0 -
SEOMOZ crawl all my pages
SEOMOZ crawl all my pages including ".do" (all web pages after sign up ) . Coz of this it finishes all my 10.000 crawl page quota and be exposed to dublicate pages. Google is not crawling pages that user reach after sign up. Because these are private pages for customers I guess The main question is how we can limit SEOMOZ crawl bot. If the bot can stay out of ".do" java extensions it'll perfect to starting SEO analysis. Do you know think about it? Cheers Example; .do java extension (after sign up page) (Google can't crawl) http://magaza.turkcell.com.tr/showProductDetail.do?psi=1001694&shopCategoryId=1000021&model=Apple-iPhone-3GS-8GB Normal Page (Google can crawl) http://magaza.turkcell.com.tr/telefon/Apple-iPhone-3GS-8GB/1001694/.html
Intermediate & Advanced SEO | | hcetinsoy0