Index.php + external site added to end of URL
-
Good day, I have a domain http://www.ecofriendlylink.com. I am trying to resolve the Crawl Diagnostic errors on it. I have several Duplicate Page Content errors.
Example 1:
(The domain happynewyou is not mine, some Comments from them have been placed on my site. Ecoshop.php is a page on my site).
URL: http://www.ecofriendlylink.com
Duplicate Page Content: http://www.ecofriendlylink.com/www./happynewyou.com/ecoshop.php
Referrer: None.
Example 2:
URL: http://ecofriendlylink.com/index.php
Duplicate Page Content: http://www.ecofriendlylink.com/index.php http://www.ecofriendlylink.com/www./happynewyou.com/index.php
Referrer: http://ecofriendlylink.com/
Example 3: is a different problem, but still a Dup Page Error.
URL: http://ecofriendlylink.com/water.php
Duplicate Page Content: http://www.ecofriendlylink.com/water.php
Referrer: http://ecofriendlylink.com/
water.php is a page on my main domain. The www version and the non-www version, if this a problem and something I need to overcome?
So please can you advise what I need to do to get rid of this strange external domain name + index.php (as per examples 1 + 2), and explain what I'm doing wrong with Ex 3.
Thank you!
-
Thank you very much for your prompt response!
I shall Google defining the 404 page in .htaccess, I'm sure I'll have these errors fixed in no time, and that makes sense re the home page.
Thank you!
-
Example 1: Your 404 page is not defined, so whenever an incorrect link is typed, the server returns a 200 OK and it just loads the home page. Somewhere on your site there is a bad link, so when the crawl followed it and returned a 200 OK , it recorded it as a real page and since it simply loads your home page, it is a duplicate. You need to define a 404 page in htaccess so this does not happen.
Example 2 + 3: the crawler is counting the following pages as your home page:
http://ecofriendlylink.com/index.php
http://www.ecofriendlylink.com/
http://wwwecofriendlylink.com/index.php
Thats because your home page can legitimately be loaded all 4 ways. This is a little different than Example 1 since these variations are normal. I suggest adding a non-www to www redirect in htaccess, as well as a redirect that forces a removal of the index.php.
However - before you do that! You should:
a) check to see where a majority of your external links point to (use www.opensiteexplorer.com). If the majority of them point to the non-www version then you may consider redirecting www to non-www. Also, check your internal links. If you redirect www to non-www or if you remove the index.php with a redirect, make sure that all internal links pointing home point to the proper URL. (so if you did a non-www to www redirect, and removed the index.php with a redirect, make sure all of your internal links point to http://www.ecofriendlylink.com/)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Random important product pages dropped out of index week ending Dec 22: why???
Hello We've been around a very long time, and I have a long running pet set of core terms and pages tracked using Moz and other tools. With no changes to the content or site or htaaccess or robots.txt or sitemap, insignificant backlink changes etc, we saw a ton of important product pages drop out of the index the week ending December 22 2019. We are still ranking for many of the terms associated, but at far worse positions since the pages G is choosing instead for those terms are not as focused. I need to be clear that this has not happened across the board, but seemingly at random. When I look in G Search Console, the pages are submitted and indexed (last crawl yesterday), mobile friendly, have breadcrumbs, and the only warning are product level for lack of optional fields under offers (nothing new, not particular to the dropped pages in question here). So, what happened the week ending December 22???? Should I expect the dust to settle and the pages to return? Extremely strange. Thx
On-Page Optimization | | jamestown0 -
Googlebot found an extremely high number of URLs on your site:
Website: www.gobol.in Although I have no indexed my search pages by adding /catalogsearch in robots.txt, still we are getting same error again and again Here's a list of sample URLs with potential problems. http://www.gobol.in/catalogsearch/result/index/?category=&mobile_feature=4575_4578&q=panasonic+NR-BU303LH1H+REFRIGERATOR+296+L+GREY&special_price=32%2C456&x=0&y=0 http://www.gobol.in/mobile-and-accessories/mobiles-and-brands.html?manufacturer=4753_3355_455_4435_4720_3407_2412_4728_4784_4790_2010_4789_4376_2469&operating_system_mobile=4612 Please help
On-Page Optimization | | Obbserv0 -
Linking Out To External Sites
Hi, All If I have created a (website, logo, email campaign) for a client and written an article about it with screen shots on my website and link to them with a (do-follow) link how does Google see the (do-follow) link? Regards to the sites they have one link in the footer on the home page, which is a (do-follow) back to our site. Also, the websites are not in my Niche.
On-Page Optimization | | deskjet0 -
Slash at the end of a url
I keep reading contradicting information, so I figured I'll ask here. What's the best practice for slash '/' at the end of a URL? Should it be idealchooser.com/search/laptop/ or idealchooser.com/search/laptop (no trailing slash)? The options: 1. Accept both equally 2. Accept 1 and redirect the other with 301 3. Accept 1 and treat the other as a wrong URL returning 404 Which would be the best for SEO? Thank you.
On-Page Optimization | | corwin0 -
Large Site - Advice on Subdomaining
I have a large news site - over 1 million pages (have already deleted 1.5 million) Google buries many of our pages, I'm ready to try subdomaining http://bit.ly/dczF5y There are two types of content - news from our contributors, and press releases. We have had contracts with the big press release companies going back to 2004/5. They push releases to us by FTP or we pull from their server. These are then processed and published. It has taken me almost 18 months, but I have found and deleted or fixed all the duplicates I can find. There are now two duplicate checking systems in place. One runs at the time the release comes in and handles most of them. The other one runs every night after midnight and finds a few, which are then handled manually. This helps fine-tune the real-time checker. Businesses often link to their release on the site because they like us. Sometimes google likes this, sometimes not. The news we process is reviews by 1,2 or 3 editors before publishing. Some of the stories are 100% unique to us. Some are from contributors who also contribute to other news sites. Our search traffic is down by 80%. This has almost destroyed us, but I don't give up easily. As I said, I've done a lot of projects to try to fix this. Not one of them has done any good, so there is something google doesn't like and I haven't yet worked it out. A lot of people have looked and given me their ideas, and I've tried them - zero effect. Here is an interesting and possibly important piece of information: Most of our pages are "buried" by google. If I dear, even for a headline, even if it is unique to us, quite often the page containing that will not appear in the SERP. The front page may show up, an index page may show up, another strong page pay show up, if that headline is in the top 10 stories for the day, but the page itself may not show up at all - UNTIL I go to the end of the results and redo the search with the "duplicates" included. Then it will usually show up, on the front page, often in position #2 or #3 According to google, there are no manual actions against us. There are also no notices in WMT that say there is a problem that we haven't fixed. You may tell me just delete all of the PRs - but those are there for business readers, as they always have been. Google supposedly wants us to build websites for readers, which we have always done, What they really mean is - build it the way we want you to do it, because we know best. What really peeves me is that there are other sites, that they consistently rank above us, that have all the same content as us, and seem to be 100% aggregators, with ads, with nothing really redeeming them as being different, so this is (I think) inconsistent, confusing and it doesn't help me work out what to do next. Another thing we have is about 7,000+ US military stories, all the way back to 2005. We were one of the few news sites supporting the troops when it wasn't fashionable to do so. They were emailing the stories to us directly, most with photos. We published every one of them, and we still do. I'm not going to throw them under the bus, no matter what happens. There were some duplicates, some due to screwups because we had multiple editors who didn't see that a story was already published. Also at one time, a system code race condition - entirely my fault, I am the programmer as well as the editor-in-chief. I believe I have fixed them all with redirects. I haven't sent in a reconsideration for 14 months, since they said "No manual spam actions found" - I don't see any point, unless you know something I don't. So, having exhausted all of the things I can think of, I'm down to my last two ideas. 1. Split all of the PRs off into subdomains (I'm ready to pull the trigger later this week) 2. Do what the other sites do, that I believe create little value, which is show only a headline and snippet and some related info and link back to the original page on the PR provider website. (I really don't want to do this) 3. Give up on the PRs and delete them all and lose another 50% of the income, which means releasing our remaining staff and upsetting all of the companies and people who linked to us. (Or find them all and rewrite them as stories - tens of thousands of them) and also throw all our alliances under the bus (I really don't want to do this) There is no guarantee this is the problem, but google won't tell me, the google forums are crap, and nobody else has given me an idea that has helped. My thought is that splitting them off into subdomains will have a number of effects. 1. Take most of the syndicated content onto subdomains, so its not on the main domain. 2. Shake up the Domain Authority 3. Create a million 301 redirects. 4. Make it obvious to the crawlers what is our news and what is PRs 5. make it easier for Google News to understand Here is what I plan to do 1. redirect all PRs to their own subdomain. pn.domain.com for PRNewswire releases bw.domain.com for Businesswire releases etc 2. Fix all references so they use the new subdomain Here are my questions - and I hope you may see something I haven't considered. 1. Do you have any experience of doing this? 2. What was the result 3. Any tips? 4. Should I put PR index pages on the subdomains too? I was originally planning to keep them on the main domain, with the individual page links pointing to the actual release on the subdomain. Obviously, I want them only in one place, but there are two types of these index pages. a) all of the releases for a particular PR company - these certainly could be on the subdomain and not on the main domain b) Various category index pages - agriculture, supermarkets, mining etc These would have to stay on the main domain because they are a mixture of different PR providers. 5. Is this a bad idea? I'm almost out of ideas. Should I add a condensed list of everything I've done already? If you are still reading, thanks for hanging in.
On-Page Optimization | | loopyal0 -
Question about URLs
Hello! I have a client that wants to upload an URL like this: www.example.com/keyword/page-name.html The main problem is that www.example.com/keyword/ doesn't exist and gives a 404 error so I'd prefer not doing that...... What do you think about this? And if the client wants to go ahead, is there any solution? A 301 to the final page would help? Thank you in advance!
On-Page Optimization | | Juandbbam0 -
ON SITE SEARCH INDEXED BY GOOGLE - no follow or no index
Google indexes alll our internetal searches: search box is brand - clothes types - size type - and for each page it creates a page that which creates duplicate page title and unnecessary content. Should I do a nofollow on the advance search or a no index. Many thanks for the info. Sonja
On-Page Optimization | | reallyitsme0 -
Which method should I use for my URL structure?
I have an existing site that is currently utilizing a structure that is like this: http://www.mysite.com/Ohio/City-of-Cleveland-PRODUCT-NAME Should I restructure it like: http://www.mysite.com/Ohio/City-of-Cleveland/Product-Name We are doing very well with very specific searches already but are sometimes coming in 2nd and 3rd place. For example: If I search for CLEVELAND PRODUCT NAME I always come up in the top three and about 60% of the time I am #1. I want to make it better. We have only launched in 4 states but plan on launching an additional 4 states over the next few weeks and I want to make sure we are building things properly. Any feedback would be wonderful. As usual, thanks everyone!! -Alex
On-Page Optimization | | dbuckles0