Site Crawl
-
I was wondering if there was a way to use SEOmoz's tool to quickly and easily find all the URLs on you site and not just the ones with errors.
The site that I am working on does not have a site map. What I am trying to do is find all the URLs along with their titles and description tags.
Thank you very much for your help
-
You can use crawlers like xenu or screaming frog (http://www.screamingfrog.co.uk/seo-spider/)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When rogerbot tried to crawl my site it gets a 404\. Why?
When rogerbot tries to craw my site it tries http://website.com. My website then tries to redirect to http://www.website.com and is throwing a 404 and ends up not getting crawled. It also throws a 404 when trying to read my robots.txt file for some reason. We allow rogerbot user agent so unsure whats happening here. Is there something weird going on when trying to access my site without the 'www' that is causing the 404? Any insight is helpful here. Thanks,
Technical SEO | | BlakeBooth0 -
Bingbot appears to be crawling a large site extremely frequently?
Hi All! What constitutes a normal crawl rate for daily bingbot server requests for large sites? Are any of you noticing spikes in Bingbot crawl activity? I did find a "mildly" useful thread at Black Hat World containing this quote: "The reason BingBot seems to be terrorizing your site is because of your site's architecture; it has to be misaligned. If you are like most people, you paid no attention to setting up your website to avoid this glitch. In the article referenced by Oxonbeef, the author's issue was that he was engaging in dynamic linking, which pretty much put the BingBot in a constant loop. You may have the same type or similar issue particularly if you set up a WP blog without setting the parameters for noindex from the get go." However, my gut instinct says this isn't it and that it's more likely that someone or something is spoofing bingbot. I'd love to hear what you guys think! Dana
Technical SEO | | danatanseo1 -
How to stop crawls for product review pages? Volusion site
Hi guys, I have a new Volusion website. the template we are using has its own product review page for EVERY product i sell (1500+) When a customer purchases a product a week later they receive a link back to review the product. This link sends them to my site, but its own individual page strictly for reviewing the product. (As oppose to a page like amazon, where you review the product on the same page as the actual listing.) **This is creating countless "duplicate content" and missing "title" errors. What is the most effective way to block a bot from crawling all these pages? Via robots txt.? a meta tag? ** Here's the catch, i do not have access to every individual review page, so i think it will need to be blocked by a robot txt file? What code will i need to implement? i need to do this on my admin side for the site? Do i also have to do something on the Google analytics side to tell google about the crawl block? Note: the individual URLs for these pages end with: *****.com/ReviewNew.asp?ProductCode=458VB Can i create a block for all url's that end with /ReviewNew.asp etc. etc.? Thanks! Pardon my ignorance. Learning slowly, loving MOZ community 😃 1354bdae458d2cfe44e0a705c4ec38dd
Technical SEO | | Jerrion0 -
Why is my site crawled so much more in Bing then in Google?
I recently setup Cloudflare so I can see how much my site is being crawled. It looks like Bing is crawling me about 3 times as much as Google. Any ideas on why that would be or what I should check?
Technical SEO | | EcommerceSite0 -
Crawl rate
Hello, In google WMT my site has the following message. <form class="form" action="/webmasters/tools/settings-ac?hl=en&siteUrl=http://www.prom-hairstyles.org/&siteUrl=http://www.prom-hairstyles.org/&hl=en" method="POST">Your site has been assigned special crawl rate settings. You will not be able to change the crawl rate.Why would this be?A bit of backgound - this site was hammered by Penguin or maybe panda but seems to be dragging itself back up (maybe) but has dropped from several thousand visitors/day to 100 or so.Cheers,Ian</form>
Technical SEO | | jwdl0 -
Brand New Site Penalized?
I recently launched 2 completely separate and unrelated websites at the same time. Both are new domains and hosting accounts. neither have any links. One is ranking for a branded search and the other is not. The interesting thing is that I tested both sites on the back end of my server before launch. The site that is not ranking for branded search IS ranking still on the back end of my site for the branded search. I have removed all content and 301 redirected the testing urls back to my portfolio page. Could this be do to Google indexing one but not the other. Does it have anything to do with testing on my server first and my DA being higher than current new sites? Or is it something completely different I'm missing completely. Is this a Penalty?
Technical SEO | | CDUBP0 -
Replacing a site map
We are in the process of changing our folder/url structure. Currently we have about 5 sitemaps submitted to Google. How is it best to deal with these site maps in terms of either (a) replacing the old URLs with the new ones in the site map and (b) what affect should we have if we removed the site map submission from the Google Webmaster Tools console. Basically we have in the region of 20,000 urls to redirect to the new format, and to update in the site map.
Technical SEO | | NeilTompkins0 -
Blocking AJAX Content from being crawled
Our website has some pages with content shared from a third party provider and we use AJAX as our implementation. We dont want Google to crawl the third party's content but we do want them to crawl and index the rest of the web page. However, In light of Google's recent announcement about more effectively indexing google, I have some concern that we are at risk for that content to be indexed. I have thought about x-robots but have concern about implementing it on the pages because of a potential risk in Google not indexing the whole page. These pages get significant traffic for the website, and I cant risk. Thanks, Phil
Technical SEO | | AU-SEO0