Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to move some pages of my website to a folder and nav menu in those pages should only show inner page links, will it hurt SEO?
Hi, My website has a few SaaS products, to make my website simple i want to move my website some pages to its specific folder structure , so eg website.com/product1/features
Technical SEO | | webbeemoz
website.com/product1/pricing
website.com/product1/information and same for product2 and so on, the website.com/product1/.. menu will only show the links of product1 and only one link to homepage (possibly in footer). Please share your opinion will it be a good idea, from UI perspective it will be simple , but i am not sure about SEO perspective, please help thanks1 -
Home Page Ranking Instead of Service Pages
Hi everyone! I've noticed that many of our clients have pages addressing specific queries related to specific services on their websites, but that the Home Page is increasingly showing as the "ranking" page. For example, a plastic surgeon we work with has a page specifically talking about his breast augmentation procedure for Miami, FL but instead of THAT page showing in the search results, Google is using his home page. Noticing this across the board. Any insights? Should we still be optimizing these specific service pages? Should I be spending time trying to make sure Google ranks the page specifically addressing that query because it SHOULD perform better? Thanks for the help. Confused SEO :/, Ricky Shockley
Technical SEO | | RickyShockley0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Is the Authority of Individual Pages Diluted When You Add New Pages?
I was wondering if the authority of individual pages is diluted when you add new pages (in Google's view). Suppose your site had 100 pages and you added 100 new pages (without getting any new links). Would the average authority of the original pages significantly decrease and result in a drop in search traffic to the original pages? Do you worry that adding more pages will hurt pages that were previously published?
Technical SEO | | Charlessipe0 -
Page titles in browser not matching WP page title
I have an issue with a few page titles not matching the title I have In WordPress. I have 2 pages, blog & creative gallery, that show the homepage title, which is causing duplicate title errors. This has been going on for 5 weeks, so its not an a crawl issue. Any ideas what could cause this? To clarify, I have the page title set in WP, and I checked "Disable PSP title format on this page/post:"...but this page is still showing the homepage title. Is there an additional title setting for a page in WP?
Technical SEO | | Branden_S0 -
Pages removed from Google index?
Hi All, I had around 2,300 pages in the google index until a week ago. The index removed a load and left me with 152 submitted, 152 indexed? I have just re-submitted my sitemap and will wait to see what happens. Any idea why it has done this? I have seen a drop in my rankings since. Thanks
Technical SEO | | TomLondon0 -
How to determine which pages are not indexed
Is there a way to determine which pages of a website are not being indexed by the search engines? I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn't necessarily show which urls aren't being indexed.
Technical SEO | | priceseo1 -
Can you 301 redirect a page to an already existing/old page ?
If you delete a page (say a sub department/category page on an ecommerce store) should you 301 redirect its url to the nearest equivalent page still on the site or just delete and forget about it ? Generally should you try and 301 redirect any old pages your deleting if you can find suitable page with similar content to redirect to. Wont G consider it weird if you say a page has moved permenantly to such and such an address if that page/address existed before ? I presume its fine since say in the scenario of consolidating departments on your store you want to redirect the department page your going to delete to the existing pages/department you are consolidating old departments products into ?
Technical SEO | | Dan-Lawrence0