Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Should i index or noindex a contact page
Im wondering if i should noindex the contact page im doing SEO for a website just wondering if by noindexing the contact page would it help SEO or hurt SEO for that website
Technical SEO | | aronwp0 -
Trailing Slashes on Home Pages
I do not think I have a problem here, but a second opinion would be welcomed... I have a site which has a the rel=canonical tag with the trailing slash displayed. ie www.example.com/ The sitemap has it without the trailing slash. www.example.com Google has it's cached copy with the trailing slash but the browser displays it without. I want to say it's perfectly fine (for the home page) as I tend to think they are treated (with/without trailing slashes) as the same canonical URL.
Technical SEO | | eventurerob0 -
Error: Missing Meta Description Tag on pages I can't find in order to correct
This seems silly, but I have errors on blog URLs in our WordPress site that I don't know how to access because they are not in our Dashboard. We are using All in One SEO. The errors are for blog archive dates, authors and just simply 'blog'. Here are samples: http://www.fateyes.com/2012/10/
Technical SEO | | gfiedel
http://www.fateyes.com/author/gina-fiedel/
http://www.fateyes.com/blog/ Does anyone know how to input descriptions for pages like these?
Thanks!!0 -
Should I nofollow search results pages
I have a customer site where you can search for products they sell url format is: domainname/search/keywords/ keywords being what the user has searched for. This means the number of pages can be limitless as the client has over 7500 products. or should I simply rel canonical the search page or simply no follow it?
Technical SEO | | spiralsites0 -
No_index of parent page
Hi, sorry its a Friday question... Page A: www.example.com/house/ Page B: www.example.com/house/kitchen Can I 'no_index' page A without it effecting page B being indexed? Views? Many thanks!
Technical SEO | | Richard5551 -
Page with h1 and h1 class=
Hi, If a page in the source code has boht following elements: class="blogg_rubrik">TITLE OF THE PAGE Is that bad for SEO, since the first H1 is empty? Shouldn't a page use only one H1?
Technical SEO | | Ypsilon0 -
What's the difference between a category page and a content page
Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? ( I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill
Technical SEO | | wparlaman0