Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.

-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
Indexing Issue of Dynamic Pages
Hi All, I have a query for which i am struggling to find out the answer. I unable to retrieve URL using "site:" query on Google SERP. However, when i enter the direct URL or with "info:" query then a snippet appears. I am not able to understand why google is not showing URL with "site:" query. Whether the page is indexed or not? Or it's soon going to be deindexed. Secondly, I would like to mention that this is a dynamic URL. The index file which we are using to generate this URL is not available to Google Bot. For instance, There are two different URL's. http://www.abc.com/browse/ --- It's a parent page.
Technical SEO | | SameerBhatia
http://www.abc.com/browse/?q=123 --- This is the URL, generated at run time using browse index file. Google unable to crawl index file of browse page as it is unable to run independently until some value will get passed in the parameter and is not indexed by Google. Earlier the dynamic URL's were indexed and was showing up in Google for "site:" query but now it is not showing up. Can anyone help me what is happening here? Please advise. Thanks0 -
When creating parent and child pages should key words be repeated in url and page title?
We are in the direct mail advertising business: PrintLabelAndMail.com Example: Parent:
Technical SEO | | JimDirectMailCoach
Postcard Direct Mail Children:
Postcard Mailings
Postcard Design
Postcard Samples
Postcard Pricing
Postcard Advantages should "postcard" be repeated in the URL and Page Title? and in this example should each of the 5 children link back directly to the parent or would it be better to "daisy chain" them using each as parent for the next?0 -
Can you noindex a page, but still index an image on that page?
If a blog is centered around visual images, and we have specific pages with high quality content that we plan to index and drive our traffic, but we have many pages with our images...what is the best way to go about getting these images indexed? We want to noindex all the pages with just images because they are thin content... Can you noindex,follow a page, but still index the images on that page? Please explain how to go about this concept.....
Technical SEO | | WebServiceConsulting.com0 -
Product Pages Outranking Category Pages
Hi, We are noticing an issue where some product pages are outranking our relevant category pages for certain keywords. For a made up example, a "heavy duty widgets" product page might rank for the keyword phrase Heavy Duty Widgets, instead of our Heavy Duty Widgets category page appearing in the SERPs. We've noticed this happening primarily in cases where the name of the product page contains an at least partial match for the desired keyword phrase we want the category page to rank for. However, we've also found isolated cases where the specified keyword points to a completely irrelevent pages instead of the relevant category page. Has anyone encountered a similar issue before, or have any ideas as to what may cause this to happen? Let me know if more clarification of the question is needed. Thanks!
Technical SEO | | ShawnHerrick0 -
No_index of parent page
Hi, sorry its a Friday question... Page A: www.example.com/house/ Page B: www.example.com/house/kitchen Can I 'no_index' page A without it effecting page B being indexed? Views? Many thanks!
Technical SEO | | Richard5551 -
Can you 301 redirect a page to an already existing/old page ?
If you delete a page (say a sub department/category page on an ecommerce store) should you 301 redirect its url to the nearest equivalent page still on the site or just delete and forget about it ? Generally should you try and 301 redirect any old pages your deleting if you can find suitable page with similar content to redirect to. Wont G consider it weird if you say a page has moved permenantly to such and such an address if that page/address existed before ? I presume its fine since say in the scenario of consolidating departments on your store you want to redirect the department page your going to delete to the existing pages/department you are consolidating old departments products into ?
Technical SEO | | Dan-Lawrence0 -
Do web pages have to be linked to a menu?
I have a situation where people search for terms like, say 1978 one dollar bill. Even though there never was a 1978 one dollar bill. I want to make a page to capture these searches but since there wasn't such a thing as a one dollar bill I don't want it connected to the rest of my content which is reality based. Does that make sense? Anyway, my question is, can I publish pages that aren't linked to my menu structure but that will be searchable or, am I going to have to figure out a way to make these oddball pages accessible through my menu?
Technical SEO | | Banknotes0