Can anyone explain why and how these odd URLs could be working?
-
In our GWT and Google Analytics traffic reports, I often see some very oddly formed URLs. Here's an example
http://www.ccisolutions.com/storefront/www.ccisolutions.com
and here's another
http://www.ccisolutions.com/StoreFront/category//www.ccisolutions.com/StoreFront/CEW.catWhat strikes me about this particular URL is two things:
- It renders this page http://www.ccisolutions.com/StoreFront/category/on-disc-printing, but not with that URL, the URL stays http://www.ccisolutions.com/StoreFront/category//www.ccisolutions.com/StoreFront/CEW.cat
- When I break this URL into pieces
http://www.ccisolutions.com/StoreFront/category/CEW.cat
and www.ccisolutions.com/StoreFront/CEW.cat,
both redirect to: http://www.ccisolutions.com/StoreFront/category/on-disc-printingThis makes me wonder, is there something (a rule?) in the
backend (maybe the .htaccess file?)that was set up that sayshttp://www.ccisolutions.com/StoreFront/category/CEW.cat
= www.ccisolutions.com/StoreFront/CEW.cat
(or maybe vice versa?), and as a result an odd URL for the page is being
written automatically?This scenario worked on every category page I checked. All had the same results. For example, I tried:
http://www.ccisolutions.com/StoreFront/category//www.ccisolutions.com/StoreFront/AAA.cat
and it rendered the Live Sound category page, but without redirecting to the
user friendly URL. This URL stayed unchanged in the address barWhen I broke it into pieces, like
http://www.ccisolutions.com/StoreFront/category/AAA.cat
and www.ccisolutions.com/StoreFront/AAA.cat, both of these redirected to http://www.ccisolutions.com/StoreFront/category/sound-video-lighting-equipment-expertsHave any of you ever encountered a problem like this? Any sugeestions as to what might be causing it and how to remedy the problem? It is definitely causing us a duplicate content headache. Thanks!
Dana
-
Thanks George! Fantastic detail and I think between your suggestions and Ben's too we are going to get further to solving this than we've ever gotten before. Perhaps we'll even solve this. That would be so great. As I mentioned, the company identified this problem 4 years before they hired me, and it's never been solved. I feel like part of why I am there as there SEO strategist is to pound away at these problems until they're fixed.
Thanks so much to you both. I can't wait to go in on Monday morning and use these suggestions to solve a five year old problem! Awesome.
I'll let you know what happens. If we fix it, I owe you and Ben dinner! (at the very least)
-
Thanks Ben. No apology necessary, it's all good. Your suggestion in combination with George's could lead us to an answer. This is definitely going to get us closer to finding the problem than we've ever gotten before. The company has been aware of this problem for almost 5 years but hasn't ever identified how to fix it. I've only been there a year now and I'm on the warpath to fix these technical issues. There are so many of them causing duplicate content problems that any SEO I do is undermined by problems like these.
I really really appreciate your reply and suggestions!
Dana
-
I'm not sure what CMS you are using, but I've seen this before with Joomla when setting the SEO Settings in the Global Configuration section of the Administration panel. Specifically, when working with the Apache mod_rewrite setting; which is related to the .htaccess question you had.
There are a number of things wrong with the way some CMS's have set up their redirects and how they present content. You may end up playing with each combination to fix your issue (depending on how you want to fix it).
If I were looking into this, I would do the following:
- I would determine if I was using Joomla. If so, check your configuration.php file and see if you have your domain name provided in the property for "live_site". If you do, try changing this from 'www.ccisolutions.com' (or whatever is there) to the empty string '' (aka just two single quotes).
- If you are not using Joomla, see if there is a configuration file for the CMS you are using and look for something similar to the above.
- If there is not a configuration setting that is providing for this "duplication" of domain name, look at the .htaccess file itself to see why it redirects when you break the URL up, but not when it has a second domain string in the URL (e.g. the second "www.ccisolutions.com").
- Then look at the code for the CMS and see how it interprets your URLs. To me it looks like you are using some sort of MVC framework which is taking each piece of the URL and translating it into variables to determine what content to show (REST-like). When it is parsing the URL, it seems to be looking for the end of the domain name and then taking anything off the end to translate into content.
However you figure out the issue, I suggest looking at how your CMS is actually producing the canonical tag. Right now this URL (http://www.ccisolutions.com/StoreFront/category/www.ccisolutions.com/StoreFront/CEW.cat) is using the following canonical:
rel="canonical" href="on-disc-printing"/>
I don't think that is what you are looking for in your canonical tags.
I hope this helps and answers your questions.
-
Hi Dana,
I wrote the following after assuming , for no reason at all, that you didn't know much about SEO. However having looked at your profile I realized that I was wrong and that my tone is probably a little patronizing. That being said it's 1am over here and I really don't want to rewrite it so please accept my apologies.
If I had to guess (and it is a guess as I'm not technical) I would say it was some badly formed links.
You know how some of your error pages have an Origin parameter (like this one) that say where the page was generated? Well these URLs follow the same format as the error pages that you're finding. It looks like rather than using an absolute link (like http://www.ccisolutions.com/page) the onclick action is actually generating a relative link (so just /page).
When you use a relative link your site adds the partial URL (/page) onto the end of your domain to give you a full URL (http://www.ccisolutions.com + /page = http://www.ccisolutions.com/page). It looks like you're using relative links as if they were static ones. Which is why you have "www.ccisolutions" in each URL twice.
If I had to blame anything it would be whatever is powering your IAFDispatcher however as I haven't been able to replicate your problem I couldn't be certain. If you can track how these URLs were generated by looking at the preceding pages that are sending traffic/bots to them then you should be able to narrow it down to which links are broken.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with existing URL when replatforming and new URL is the same?
We are changing CMS from WordPress to Uberflip. If there is a URL that remains the same I believe we should not create a redirect. However, what happens to the old page? Should it be deleted?
Technical SEO | | maland0 -
Url folder structure
I work for a travel site and we have pages for properties in destinations and am trying to decide how best to organize the URLs basically we have our main domain, resort pages and we'll also have articles about each resort so the URL structure will actually get longer:
Technical SEO | | Vacatia_SEO
A. domain.com/main-keyword/state/city-region/resort-name
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature _
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village/kid-friend-pool_ B. Another way to structure would be to remove the location and keyword folders and combine. Note that some of the resort names are long and spaces are being replaced dynamically with dashes.
ex. domain.com/main-keyword-in-state-city/resort-name
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature_
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village-kid-friend-pool_ Question: is that too many folders or should i combine or break up? What would you do with this? Trying to avoid too many dashes.0 -
Can a CMS affect SEO?
As the title really, I run www.specialistpaintsonline.co.uk and 6 months ago when I first got it it had bad links which google had put a penalty against it so losts it value. However the penalty was lift in Sept, the site corresponds to all guidelines and seo work has been done and constantly monitored. the issue I have is sales and visits have not gone up, we are failing fast and running on 2 or 3 sales a month isn't enough to cover any sort of cost let alone wages. hence my question can the cms have anything to do with it? Im at a loss and go grey any help or advice would be great. thanks in advance.
Technical SEO | | TeamacPaints0 -
Home page URL
Hi, I work on this site: http://www.towerhousetraining.co.uk/about-us. This is the home page URL. Should this be 301'd to: http://www.towerhousetraining.co.uk? I have created a site map, which I submitted to Google Webmaster Tools, which includes these URL's: /about-us, /training-we-offer & /contact-us. There are a total of 3 pages on the website. Webmaster tools has only indexed 2 out of 3 pages. I think this is something to do with the /about-us URL, as when I do a site: search, these pages appear: www.towerhousetraining.co.uk/, /training-we-offer & /contact-us. I am not sure why Google has indexed the home page as www.towerhousetraining.co.uk/ and not /about-us? Is it a bad idea in general not to have your homepage as your root domain? I added a to the homepage, but am wondering if this was the right thing to do? Any help would be appreciated.
Technical SEO | | CWseo0 -
Is my robots.txt file working?
Greetings from medieval York UK 🙂 Everytime to you enter my name & Liz this page is returned in Google:
Technical SEO | | Nightwing
http://www.davidclick.com/web_page/al_liz.htm But i have the following robots txt file which has been in place a few weeks User-agent: * Disallow: /york_wedding_photographer_advice_pre_wedding_photoshoot.htm Disallow: /york_wedding_photographer_advice.htm Disallow: /york_wedding_photographer_advice_copyright_free_wedding_photography.htm Disallow: /web_page/prices.htm Disallow: /web_page/about_me.htm Disallow: /web_page/thumbnails4.htm Disallow: /web_page/thumbnails.html Disallow: /web_page/al_liz.htm Disallow: /web_page/york_wedding_photographer_advice.htm Allow: / So my question is please... "Why is this page appearing in the SERPS when its blocked in the robots txt file e.g.: Disallow: /web_page/al_liz.htm" ANy insights welcome 🙂0 -
GWT, URL Parameters, and Magento
I'm getting into the URL parameters in Google Webmaster Tools and I was just wondering if anyone that uses Magento has used this functionality to make sure filter pages aren't being indexed. Basically, I know what the different parameters (manufacturer, price, etc.) are doing to the content - narrowing. I was just wondering what you choose after you tell Google what the parameter's function is. For narrowing, it gives the following options: Which URLs with this parameter should Googlebot crawl? <label for="cup-crawl-LET_GOOGLEBOT_DECIDE">Let Googlebot decide</label> (Default) <label for="cup-crawl-EVERY_URL">Every URL</label> (the page content changes for each value) <label style="color: #5e5e5e;" for="cup-crawl-ONLY_URLS_WITH_VALUE">Only URLs with value</label> ▼(may hide content from Googlebot) <label for="cup-crawl-NO_URLS">No URLs</label> I'm not sure which one I want. Something tells me probably "No URLs", as this content isn't something a user will see unless they filter the results (and, therefore, should not come through on a search to this page). However, the page content does change for each value.I want to make sure I don't exclude the wrong thing and end up with a bunch of pages disappearing from Google.Any help with this is greatly appreciated!
Technical SEO | | Marketing.SCG0 -
/out/ URLs in GWMTs
I am recently seeing some URLs come up as 404s in GWMTs for a client. They look like this: http://client-url/out/www.linkedin.com/company/client-linkedin-name /out/client-url/sub-directory/postname/ We thought they might have something to do with the social plugins but they are all over the place and they are sometime for internal pages on the site. Anyone run into these and know why they are happening?
Technical SEO | | DragonSearch0 -
Compare URLs with 302 redirects
Hello I have a store which was developed in Magento. I have about 8300 errors like this: URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vcHJpbnRlci1wYXJ0cy5odG1sP3A9NA,,/ 1 Warning 302 (Temporary Redirect) Found 3 days ago <dl> <dt>Redirects to</dt> <dt>http://goo.gl/XMaZg</dt> <dd>Description</dd> <dd>Using a 302 redirect will cause search engine crawlers to treat the redirect as temporary and not pass any link juice (ranking power). We highly recommend that you replace 302 redirects with 301 redirects.</dd> </dl> <a class="more expanded">Minimize</a> These URLs, are generated by magento by the COMPARE feature. In my store we bought an extension called SEO Enterprise Suite and I asked the developers(www.mageworx) about this error. Their answer is: Sorry for the late reply. Our extension adds NOINDEX,FOLLOW tag to compare and cookies pages so that they won't be indexed. I do not think that these redirects can hurt your SEO because these pages won't be indexed at all. The question is: What should I do? Is there anyway that SEOMOZ ignores these URLs? What should I do next, I just dont like to have that HIGH number of errors and warnings. Thank you
Technical SEO | | levalencia10