Mapping Internal Links (Which are causing duplicate content)
-
I'm working on a site that is throwing off a -lot- of duplicate content for its size. A lot of it appears to be coming from bad links within the site itself, which were caused when it was ported over from static HTML to Expression Engine (by someone else).
I'm finding EE an incredibly frustrating platform to work with, as it appears to be directing 404's on sub-pages to the page directly above that subpage, without actually providing a 404 response. It's very weird.
Does anyone have any recommendations on software to clearly map out a site's internal link structure so that I can find what bad links are pointing to the wrong pages?
-
You could try: http://validator.w3.org/checklink
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does duplicate content not concern Rand?
Hello all, I'm a new SEOer and I'm currently trying to navigate the layman's minefield that is trying to understand duplicate content issues in as best I can. I'm working on a website at the moment where there's a duplicate content issue with blog archives/categories/tags etc. I was planning to beat this by implementing a noindex meta tag on those pages where there are duplicate content issues. Before I go ahead with this I thought: "Hey, these Moz guys seem to know what they're doing! What would Rand do?" Blogs on the website in question appear in full and in date order relating to the tag/category/what-have-you creating the duplicate content problem. Much like Rand's blog here at Moz - I thought I'd have a look at the source code to see how it was dealt with. My amateur eyes could find nothing to help answer this question: E.g. Both the following URLs appear in SERPs (using site:moz,com and very targeted keywords, but they're there): https://moz.com/rand/does-making-a-website-mobile-friendly-have-a-universally-positive-impact-on-mobile-traffic/ https://moz.com/rand/category/moz/ Both pages have a rel="canonical" pointing to themselves. I can understand why he wouldn't be fussed about the category not ranking, but the blog? Is this not having a negative effect? I'm just a little confused as there are so many conflicting "best practice" tips out there - and now after digging around in the source code on Rand's blog I'm more confused than ever! Any help much appreciated, Thanks
Technical SEO | | sbridle1 -
Value of internal links like this
Hello I have a question for internal links build in the pattern below does google value these kinds of pattern of internal links... For example i have 3 pages on website A, B and C, The page A is homepage, B is cateogory page and C is product page and I am on page C, where I build internal links like this Home > Catogory > product page
Technical SEO | | tanveerayakhan0 -
Duplicate Content Issues - Where to start???
Dear All I have recently joined a new company Just Go Holidays - www.justgoholidays.com I have used the SEO Moz tools (yesterday) to review the site and see that I have lots of duplicate content/pages and also lots of duplicate titles all of which I am looking to deal with. Lots of the duplicate pages appear to be surrounding, additional parameters that are used on our site to refine and or track various marketing campaigns. I have therefore been into Google Webmaster Tools and defined each of these parameters. I have also built a new XML sitemap and submitted that too. It looks as is we have two versions of the site, one being at www.justgoholidays.com and the other without the www It appears that there are no redirects from the latter to the former, do I need to use 301's here or is it ok to use canonicalisation instead? Any thoughts on an action plan to try to address these issues in the right order and the right way would be very gratefully received as I am feeling a little overwhelmed at the moment. (we also use a CMS system that is not particularly friendly and I think I will have to go directly to the developers to make lots of the required changes which is sure to cost - therefore really don't want to get this wrong) All the best Matt
Technical SEO | | MattByrne0 -
Development Website Duplicate Content Issue
Hi, We launched a client's website around 7th January 2013 (http://rollerbannerscheap.co.uk), we originally constructed the website on a development domain (http://dev.rollerbannerscheap.co.uk) which was active for around 6-8 months (the dev site was unblocked from search engines for the first 3-4 months, but then blocked again) before we migrated dev --> live. In late Jan 2013 changed the robots.txt file to allow search engines to index the website. A week later I accidentally logged into the DEV website and also changed the robots.txt file to allow the search engines to index it. This obviously caused a duplicate content issue as both sites were identical. I realised what I had done a couple of days later and blocked the dev site from the search engines with the robots.txt file. Most of the pages from the dev site had been de-indexed from Google apart from 3, the home page (dev.rollerbannerscheap.co.uk, and two blog pages). The live site has 184 pages indexed in Google. So I thought the last 3 dev pages would disappear after a few weeks. I checked back late February and the 3 dev site pages were still indexed in Google. I decided to 301 redirect the dev site to the live site to tell Google to rank the live site and to ignore the dev site content. I also checked the robots.txt file on the dev site and this was blocking search engines too. But still the dev site is being found in Google wherever the live site should be found. When I do find the dev site in Google it displays this; Roller Banners Cheap » admin <cite>dev.rollerbannerscheap.co.uk/</cite><a id="srsl_0" class="pplsrsla" tabindex="0" data-ved="0CEQQ5hkwAA" data-url="http://dev.rollerbannerscheap.co.uk/" data-title="Roller Banners Cheap » admin" data-sli="srsl_0" data-ci="srslc_0" data-vli="srslcl_0" data-slg="webres"></a>A description for this result is not available because of this site's robots.txt – learn more.This is really affecting our clients SEO plan and we can't seem to remove the dev site or rank the live site in Google.Please can anyone help?
Technical SEO | | SO_UK0 -
404 and Duplicate Content.
I just submitted my first campaign. And it's coming up with a LOT of errors. Many of them I feel are out of my control as we use a CMS for RV dealerships. But I have a couple of questions. I got a 404 error and SEO Moz tells me the link, but won't tell me where that link originated from, so I don't know where to go to fix it. I also got a lot of duplicate content, and it seems a lot of them are coming from "tags" on my blog. Is that something I should be concerned about? I will have a lot more question probably as I'm new to using this tool Thanks for the responses! -Brandon here is my site: floridaoutdoorsrv.com I welcome any advice or input!
Technical SEO | | floridaoutdoorsrv0 -
Does turning website content into PDFs for document sharing sites cause duplicate content?
Website content is 9 tutorials published to unique urls with a contents page linking to each lesson. If I make a PDF version for distribution of document sharing websites, will it create a duplicate content issue? The objective is to get a half decent link, traffic to supplementary opt-in downloads.
Technical SEO | | designquotes0 -
Does Google care how you write internal links?
I am changing ecommerce platforms. For my internal linking on the old site there was a lot of old links written like this: http://www.domain.com/page-name But now i am writing links mostly like this: /page-name Will that make a difference to search engines? Is one easier than the other for them to interpret?
Technical SEO | | Hyrule0 -
Duplicate content
This is just a quickie: On one of my campaigns in SEOmoz I have 151 duplicate page content issues! Ouch! On analysis the site in question has duplicated every URL with "en" e.g http://www.domainname.com/en/Fashion/Mulberry/SpringSummer-2010/ http://www.domainname.com/Fashion/Mulberry/SpringSummer-2010/ Personally my thoughts are that are rel = canonical will sort this issue, but before I ask our dev team to add this, and get various excuses why they can't I wanted to double check i am correct in my thinking? Thanks in advance for your time
Technical SEO | | Yozzer0