How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Speed or Size?
Hi everyone. I have a client who really wants to add a 1min html5 video to the background of their homepage. I have managed to reduce the size of the video to 20MB and I have tested the page in pingdom. The results are 1.85 s to load, and weighed in at 21.2 MB. My question is does Google factor page load speed or size in it's ranking factors? I am also mindful of the negative effect this could have on bounce rate. Thanks.
Technical SEO | | WillWatrous0 -
What to do with temporary empty pages?
I have a website listing real estate in different areas that are for sale. In small villages, towns, and areas, sometimes there is nothing for sale and therefore the page is completely empty with no content except a and some footer text. I have thousand of landing pages for different areas. For example "Apartments in Tibro" or "Houses in Ljusdahl" and Moz Pro gives me some warnings for "Duplicate Content" on the empty ones (I think it does so because the pages are so empty that they are quite similar). I guess Google could also think bad of my site if I have hundreds or thousands of empty pages even if my total amount of pages are 100,000. So, what to do with these pages for these small cities, towns and villages where there is not always houses for sale? Should I remove them completely? Should I make a 404 when no houses for sale and a 200 OK when there is? Please note that I have totally 100,000+ pages and this is only about 5% of all my pages.
Technical SEO | | marcuslind900 -
Should I change my targeted page?
Currently I have a site where the targeted keywords were on the home page, with links built to the homepage. It has been widely recognised though that Google is looking more and more for specific content on webpages that holds greater relevance to search queries. As such, I switched this targeted page to other created webpages - changing metatags and creating more relevant content for respective keywords. I thought this would improve rankings, however, upon doing this there was a sharp fall in rankings for keywords. Is there anything that I could have done wrong, or can do better so that keywords move back up the rankings?
Technical SEO | | Gavo0 -
Google Places Page Changes
We had a client(dentist) hire another marketing firm(without our knowledge) and due to some Google page changes they made, their website lost a #1 ranking, was disassociated with the places page and was placed at result #10 below all the local results. We quickly made some changes and were able to bring them up to #2 within a few days and restore their Google page after about a week, but the tracking/forwarding phone number the marketing company was using shows up on the page despite attempts to contact Google through updating the business in places management as well as submit the phone number as incorrect while providing the correct phone number. And because the client fired that marketing company, the phone number will no longer be active in a few days. Of course this is very important for a dental office. Has anyone else had problems with the speed and updating Google Places/Plus pages for businesses? What's the most efficient way to make changes like this?
Technical SEO | | tvinson0 -
When Is It Good To Redirect Pages on Your Site to Another Page?
Suppose you have a page on your site that discusses a topic that is similar to another page but targets a different keyword phrase. The page has medium quality content, no inbound links, and the attracts little traffic. Should you 301 redirect the page to a stronger page?
Technical SEO | | ProjectLabs1 -
WordPress & Page Numbers
Hi, I am working on a large WP site for a client and have an issue with duplicate content and page numbers. I am using the Yoast SEO plugin but can't seem to resolve the issue. Let me give an example: If I go to a popular category, for example F1, there are over 10 pages of content for the category and although the URL changes, the Title and Meta Description stay the same. Now...if I was using a template for the title and description I could add the page number variable, but as I am overwriting the template with SEO specific category information I can't use variables and hence the problem! This is such a common problem I know somebody will have an answer! Thanks
Technical SEO | | JonathanSmith0 -
Linking from and to pages
My website, www.kamperen-bij-de-boer.com, tells people what campingssites can be found in The Netherlands for recreational purposes. In order for a campingsite to be mentioned on our website we ask them to place a link to our website (either using a text link or image link) and then we make a page for that campsite on our website with in the end a link to ther website, e.g. http://www.kamperen-bij-de-boer.com/Minicamping-In-t-Oldambt.html -> they in return link back to us. Since this comes natural will this or won't this be penalized by Google and so on for linkfarming. At this moment we have about 600 camping sites on our website alone linking to us (not all of them) and we are linking to them. Since this can be explained as link trading which is not as good for your ranking as one-way-linking what should be wise? Should i include a nofollow? I already have many links from other sites linking to mine without having to link back, is there anything else i can do with linking to ensure better ranking?
Technical SEO | | JarnoNijzing0 -
Product category paging
Hi, My product categories have 2-3 pages each. I have paging implemented with rel=next and rel=prev. from some reason Google GWT now reports the pages as having duplicate titles and description. Should I be worried? Should I set a different title like "blue category - page x" ? Thanx, Asaf
Technical SEO | | AsafY0