Help with Roger finding phantom links

oznappies

It Monday and Roger has done another crawl and now I have a couple of issues:

I have two pages showing 404->302 or 500 because these links do not exist. I have to fix the 500 but the 404 is trapped correctly.

http://www.oznappies.com/nappies.faq & http://www.oznappies.com/store/value-packs/\

The issue is when I do a site scan there is no anchor text that contains these links. So, what I would like to find out is where is Roger finding them. I cannot see any where in the Crawl Report that tells me where the origin of these links is.

I also created a blog on Tumblr and now every tag and rss feed entry is producing a duplicate content error in the crawl stats. I cannot see anywhere in Tumblr to fix this issue.

Any Ideas?

oznappies

Thanks again Ryan, you have been very helpful answering al lot of my questions.

RyanKent

Someone else asked the same question regarding tag pages yesterday. I would suggest asking a separate Q&A on that topic.

Tag pages & forum category pages are both often used as containers. They don't have any content except links to articles. I would ask for feedback as to the best practice. I suspect noindex, following those pages would be best, but I don't have the experience to feel comfortable offering that advice.

oznappies

I have been looking at the data that Roger is reporting for the duplicate content and in ALL cases there is either a 301 or a NoIndex. So now I do not know why Roger is reporting them as a duplicate, robots should not see the second entry.

oznappies

I did not think of looking at the csv report. I see it now thanks Ryan. There should be a soft 404 handler in place to process the bad urls, I will have to see why it is not working.

With tumblr, I was looking for an easy way to add a blog to the site.

The RSS is coming from tumblr as is all the content.

When we specify Tags in tumblr it creates urls e.g. mypage.com/article/tag1 mypage.com/article/tag2 mypage.com/article/tag3 which all contain the content of mypage.com/article with out a canonical to the original. It is a really strange non-seo friendly approach, and so I wondered if anyone had similar problems.

RyanKent

The crawl report offers a "referrer" field. That field offers where Roger found the offending link. In my experience that field has always been accurate.

When I try to access www.oznappies.com/faq I receive a 302 redirect and a 500 error. I would recommend adjusting non-existant pages to a soft 404 page. Still provide a 404 response to browsers, but offer users a friendly way to find information (i.e. links / search) and stay on your site.

A great example of a soft 404 page is http://www.orangecoat.com/a-404-page.html

For the Tumblr issue, I am not clear on the problem. Are you writing content and publishing on both the oznappies.com site and your tumblr site? Then this content is being published again on your site via a RSS import?

oznappies

I removed the links and just left the text so these will cut and paste now. It confuses me where Roger found the links.

Thanks for running the Xenu scan. I have tried other site scanner and come up blank.

StalkerB

That second link is anchored to the wrong place.

Regardless I also cannot find the .faq page. I just ran Xenu over it to see what it could find, but no broken links showed up.

Afraid I don't use Tumblr either, so eh, pretty useless post. Sorry.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Help with Roger finding phantom links

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl Diagnostics saids a page is linking but I can't find the link on the page.

Find Historical SERP Ranking for a Term?

Competitors links increasing rapidly

How can I find out why my Domain authority has gone down?

Problem with advanced linking domains report in OSE

help with the inbound links side of seomoz

How do I find the corresponding duplicate content pages from my SEOmoz report?

Crawl test. Bot crawled only 200 or so links when it should have crawled thousands