How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Page Question
Hi, I have a question relation to Canonical pages That i need clearing up. I am not sure that my bigcommere website is correctly configured and just wanted clarification from someone in the know. Take this page for example https://www.fishingtackleshop.com.au/barra-lures/ Canonical link is https://www.fishingtackleshop.com.au/barra-lures/ The Rel="next" link is https://www.fishingtackleshop.com.au/barra-lures/?sort=bestselling&page=2 and this page has a canonical tag as rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/?page=2' /> Is this correct as above and working as it should or should the canonical tag for the second (pagination page) https://www.fishingtackleshop.com.au/barra-lures/?page=2 in our source code be saying rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/' />
Technical SEO | | oceanstorm0 -
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Page disappeared from Google index. Google cache shows page is being redirected.
My URL is: http://shop.nordstrom.com/c/converse Hi. The week before last, my top Converse page went missing from the Google index. When I "fetch as Googlebot" I am able to get the page and "submit" it to the index. I have done this several times and still cannot get the page to show up. When I look at the Google cache of the page, it comes up with a different page. http://webcache.googleusercontent.com/search?q=cache:http://shop.nordstrom.com/c/converse shows: http://shop.nordstrom.com/c/pop-in-olivia-kim Back story: As far as I know we have never redirected the Converse page to the Pop-In page. However the reverse may be true. We ran a Converse based Pop-In campaign but that used the Converse page and not the regular Pop-In page. Though the page comes back with a 200 status, it looks like Google thinks the page is being redirected. We were ranking #4 for "converse" - monthly searches = 550,000. My SEO traffic for the page has tanked since it has gone missing. Any help would be much appreciated. Stephan
Technical SEO | | shop.nordstrom0 -
Local City Pages
Anyone have any input on the tactics being used for a national company trying to target local city pages. For instance, you might be a national printing company and you are trying to compete against local printers in cities by creating a specific page for that city + print keywords.
Technical SEO | | waqid0 -
Old Product Pages
Hi Issue: I have old versions of a product page in the Google index for a product that I still carry. Why: The URLs were changed when we updated this product page a few years ago. There are four different URLs for this product -- no duplicate content issues b/c we updated the product info, Title tags, etc. So I have a few pages indexed by Google for a particular product. Including a current, up-to-date page. The old pages don't get any traffic, but if I type in google search: "product name" site:store.com then all of the versions of this page appear. The old pages don't have any links to them, only one has any PA, and as I said they don't get any traffic, and the current page is around #8 in google for its keyword. Question: Do these old pages need 301 redirects, should I ask google to remove the old URLs? It seems like Google picks the right version of this page for this keyword query, is it possible that the existence of these other pages (that are not nearly as optimized for the keyword) drag it down a bit in the results? Thanks in advance for any help
Technical SEO | | IOSC0 -
Keyword targeting by page, site, or both?
Hi, We recently discovered that a product we sell has a misnomer, and that a ton of people take to Google and use variations of that misnomer while trying to find us. Unfortunately we don't rank in Google for this keyword, and its costing us thousands in lost sales. I've been slowly building the misnomer into the content of our site in hopes that the spiders will pick up on it. It has started to work in the last couple weeks, but we're nowhere near the top (and we are #1 and #2 for most of our other prime keywords.) The site which sells the product is specialized, and only sells this specific product (in different models, but they're all the same product essentially.) With that in mind, I'm trying to figure out the best way to attack a new keyword. I know that normally you would dedicate a specific page (in an eCommerce store probably that product's own page) to employ your SEO tactics. However, because this site specializes in this product and offers different models and information about it I'm confused about the best approach. Does Google take into consideration the entire site a s whole, or are the pages within my site competing against each other for rank?
Technical SEO | | ninjaprecision0 -
When Is It Good To Redirect Pages on Your Site to Another Page?
Suppose you have a page on your site that discusses a topic that is similar to another page but targets a different keyword phrase. The page has medium quality content, no inbound links, and the attracts little traffic. Should you 301 redirect the page to a stronger page?
Technical SEO | | ProjectLabs1 -
Renaming of pages
About 2 months ago one of our clients renamed a section of his website. The worst part is that the URLs of the page also changed. New page: http://www.meresverige.dk/rejser/malmo Old page: http://www.meresverige.dk/rejser/malmoe The problem now is that the new page get absolutely no page-rank transfered from the old page. It also get no mozrank at all. Also if I try to find it in the Open Site Explorer it can not be found.The old page can, but not the new one. We have updated the sitemap.xml and also done proper 301 redirect for the pages since about 2 months. Any ideas here? This page was a very important page in terms of traffic so very much thankful for any input. Have a great day Fredrik
Technical SEO | | Resultify0