Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Trying to find all internal links to a specific page (without index)
Hi guys -- Still waiting on Moz to index a page of mine. We launched a new site over two months ago. In the meantime, I really just need a list of internal links to a specific page because I want to change its URL. Does anybody know how to find that list (of internal links to 1 of my pages) without the Moz index? I appreciate the help!
Technical SEO | | marchexmarketingmcc1 -
Does a no-indexed parent page impact its child pages?
If I have a page* in WordPress that is set as private and is no-indexed with Yoast, will that negatively affect the visibility of other pages that are set as children of that first page? *The context is that I want to organize some of the pages on a business's WordPress site into silos/directories. For example, if the business was a home remodeling company, it'd be convenient to keep all the pages about bathrooms, kitchens, additions, basements, etc. bundled together under a "services" parent page (/services/kitchens/, /services/bathrooms/, etc.). The thing is that the child pages will all be directly accessible from the menus, so there doesn't need to be anything on the parent /services/ page itself. Another such parent page/directory/category might be used to keep different photo gallery pages together (/galleries/kitchen-photos/, /galleries/bathroom-photos/, etc.). So again, would it be safe for pages like /services/kitchens/ and /galleries/addition-photos/ if the /services/ and /galleries/ pages (but not /galleries/* or anything like that) are no-indexed? Thanks!
Technical SEO | | BrianAlpert781 -
Low page impressions
Hey there MOZ Geniuses; While checking my webmaster data I noticed that almost all my Google impressions are generated by the home page, most other content pages are showing virtually no impression data <50 (the home page is showing around 1500 - a couple of the pages are in the 150-200 range). the site has been up for about 8 months now. Traffic on average is about 500 visitors, but I'm seeing very little entry other then the home page. Checking the number Sitemap section 27 of 30 are index Webmaster tools are not reporting errors Webmaster keyword impressions are also extremely low 164 keywords with the highest impression count of 79 and dropping from there. MOZ is show very few minor issues although it says that it crawled 10k pages? -- we only have 30 or so. The answer seems obvious, Google is not showing my content ... the question is why and what steps can I take to analyze this? Could there be a possibility of some type of penalty? I welcome all your suggestions: The site is www.calibersi.com
Technical SEO | | VanadiumInteractive0 -
When creating parent and child pages should key words be repeated in url and page title?
We are in the direct mail advertising business: PrintLabelAndMail.com Example: Parent:
Technical SEO | | JimDirectMailCoach
Postcard Direct Mail Children:
Postcard Mailings
Postcard Design
Postcard Samples
Postcard Pricing
Postcard Advantages should "postcard" be repeated in the URL and Page Title? and in this example should each of the 5 children link back directly to the parent or would it be better to "daisy chain" them using each as parent for the next?0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
How to identify orphan pages?
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help? I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me. Or are there other ways to identify orphan pages?
Technical SEO | | MarieHaynes0 -
What's the difference between a category page and a content page
Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? ( I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill
Technical SEO | | wparlaman0