Where does the crawler find the urls?
-
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful
however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS
Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak
thanks
-
If you export the crawl diagnostics to a CSV, we do have this information in the last column.
-
thanks for the tips. It is a little frustrating that the information I need has passed through seomoz's system but I guess they don't have the inclination or resources to show us the info
Xenu reckons it can handle 1m urls, we are in the position of not really knowing how many pages our site has!
-
You can pop the links into the free Xenu Link Sleuth* - after you've done a crawl just right-click on the URL you're interested in and click 'URL Properties' - you'll see any inlinks it finds listed there. Depending on the size of your site, it could take a while for the crawl to complete.
You could try the link: property in Google first, though it won't be as thorough as Xenu.
*If you haven't seen it before, don't worry about how the Xenu website looks - the software is kosher - as recommended by many SEOmoz staff. Screaming Frog is a paid alternative (with a limited free version).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 error for unknown URL that Moz is finding in our blog
I'm receiving 404 errors on my site crawl for messinastaffing.com. They seem to be generating only from our blog posts which sit on Hubspot. I've searched high and low and can't identify why our site URL is being added at the end - I've tried every link in our blog and cannot repeat the error the crawl is finding. For instance: Referer is: http://blog.messinastaffing.com/take-charge-career-story-compelling-cover-letter/ 404 error is: http://blog.messinastaffing.com/take-charge-career-story-compelling-cover-letter/www.messinastaffing.com I agree that the 404 error URL doesn't exist but I can't identify where Moz is finding it. I have approximately 75 of these errors - one for every blog on our site. Beth Morley Vice President, Operations Messina Group Staffing Solutions
Moz Pro | | MessinaGroup
(847) 692-0613 www.messinastaffing.com0 -
Duplicate URLs
A campaign that I ran said that my client's site had some 47,000+ duplicate pages and titles. I was wondering how I can possibly set that many 301 redirects, but a Moz help engineer said it has a lot to do with session IDs. See this set of duplicate URLs: http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring (clearly the main URL for the page)
Moz Pro | | AlanJacob
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac00a2e0ad53eb90cb0b0304d178fc1
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac3039d0ad4af2720b3ccd2238547ab
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac071ed0ad4af292684b0746931158f To a crawler, that looks like 4 different pages, when it's clear that they're actually all different URLs for the same page. I was wondering if some of you, maybe with experience in site architecture, would have insight into how to address this issue? Thanks Alan0 -
How to fix overly dynamic URLs for Volusion site?
We're currently getting over 5439 pages with an 'overly dynamic URL' warning in our Moz scan. The site is run on Volusion. Is there a way to fix this seeming Volusion error?
Moz Pro | | Brandon_Clay0 -
Rogerbot's crawl behaviour vs google spiders and other crawlers - disparate results have me confused.
I'm curious as to how accurately rogerbot replicates google's searchbot I've currently got a site which is reporting over 200 pages of duplicate/titles content in moz tools. The pages in question are all session IDs and have been blocked in the robot.txt (about 3 weeks ago), however the errors are still appearing. I've also crawled the page using screaming frog SEO spider. According to Screaming Frog, the offending pages have been blocked and are not being crawled. Webmaster tools is also reporting no crawl errors. Is there something I'm missing here? Why would I receive such different results. Which one's should I trust? Does rogerbot ignore robot.txt? Any suggestions would be appreciated.
Moz Pro | | KJDMedia0 -
Crawl Diagnostics Shows thousands of 302's from a single url. I'm confused
Hi guys I just ran my first campaign and the crawl diagnostics are showing some results I'm unfamiliar with.. In the warnings section it shows 2,838 redirects.. this is where I want to focus. When I click here it shows 5 redirects per page. When I go to click on page 2, or next page, or any other page than page 1 for that matter... this is where things get confusing. Nothing shows. Downloading the csv reveals that 2,834 of these are all showing: URL: http://www.mydomain.com/401/login.php url: http://www.mydomain.com/401/login.php referrer: http://www.mydomain.com/401/login.php location_header: http://www.mydomain.com/401/login.php I guess I'm just looking for an explanation as to why it's showing so many to the same page and what possible actions can be taken on my part to correct it (if needed). Thanks in advance
Moz Pro | | sethwb0 -
Need to find all pages that link to list of pages/pdf's
I know I can do this in OSE page by page, but is there a way I can do this in a large batch? There are 200+ PDF's that I need to figure out what pages (if any) link to the PDF. I'd rather not do this page by page, but rather copy-paste the entire list of pages I'm looking for. Any tools you know of that can do this?
Moz Pro | | ryanwats0 -
Crawler reporting incorrect URLs, resulting in false errors...
The SEOmoz crawler is showing 236 Duplicate Page Titles. When I go in to see what page titles are duplicated I see that the URLs in question are incorrect and read "/about/about/..." instead of just "/about/" The shown page duplicates are the result of the crawler is ending up on the "Page not found" page. Could it be the result of using relative links on the site? Anything I can do to remedy? Thanks for your help! -Frank
Moz Pro | | Clements1 -
How do I find questions with new responses?
I'm really enjoying hanging out here so far. I was wondering if there is an easy way to find out which of the discussions that I am taking part in have new responses in them. I know I get an email whenever a question I have subscribed to gets an answer, but is there a way to see this when I am on the website? I see that I can go to "My questions" and "Questions I have asked" or "Questions I have answered" but on that screen it doesn't tell me which one(s) of those questions have new answers. Am I missing something?
Moz Pro | | MarieHaynes1