Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
WordPress - How to stop both http:// and https:// pages being indexed?
-
Just published a static page 2 days ago on WordPress site but noticed that Google has indexed both http:// and https:// url's. Usually I only get http:// indexed though.
Could anyone please explain why this may have happened and how I can fix? Thanks!
-
Just one adjustment to this - although I think David's right that the canonical tag can be a good solution. Although Google can index https: fine, the issue is whether you're creating duplicates. If you have duplicates, then it's possible that the https: version could be the one you want as canonical. In this case, it doesn't sound like it, but I just wanted to point that out.
Of course, long-term, you should sort out why these are being created. A desktop crawler like Xenu or Screaming Frog may be the best bet, but I'd hit the WordPress forums, too. Odds are it's a common issue. Typically, it happens when some deeper page (like a shopping cart) on a site is secure, and then the links are all relative ("/about.php", for example). Then, those links get crawled as both secure and non-secure.
Unfortunately, I'm not a WordPress expert, so I can only speak in generalities.
-
Thanks David, I feel like going out to buy some Swedish Fish for some reason now.
-
I actually just did a wealth of research on this topic a few days ago. Without going into the nitty gritty details, if the https is site-wide Google recommends a Rel="canonical" attribute (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394) pointing to the non-secure http version. Google claims it can index https fine, but Matt Cutts said he would "lean towards pointing the canonical to the http version." Also, on the Rel="canonical" page Google says:
If you publish content on both http://www.example.com/product.php?item=swedish-fish and https://www.example.com/product.php?item=swedish-fish, you can specify the canonical version of the page. Create the element:
Add this link to the section of https://www.example.com/product.php?item=swedish-fish.
Make sure the canonical is on every page of your site.
Not sure why this may have happened, but it is creating duplicate content, which is why the canonical is necessary.
Hope that helps!
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way to get a list of all pages of your website that are indexed in Google?
I am trying to put together a comprehensive list of all pages that are indexed in Google and have differing opinions on how to do this.
Technical SEO | | SpodekandCo0 -
Does a no-indexed parent page impact its child pages?
If I have a page* in WordPress that is set as private and is no-indexed with Yoast, will that negatively affect the visibility of other pages that are set as children of that first page? *The context is that I want to organize some of the pages on a business's WordPress site into silos/directories. For example, if the business was a home remodeling company, it'd be convenient to keep all the pages about bathrooms, kitchens, additions, basements, etc. bundled together under a "services" parent page (/services/kitchens/, /services/bathrooms/, etc.). The thing is that the child pages will all be directly accessible from the menus, so there doesn't need to be anything on the parent /services/ page itself. Another such parent page/directory/category might be used to keep different photo gallery pages together (/galleries/kitchen-photos/, /galleries/bathroom-photos/, etc.). So again, would it be safe for pages like /services/kitchens/ and /galleries/addition-photos/ if the /services/ and /galleries/ pages (but not /galleries/* or anything like that) are no-indexed? Thanks!
Technical SEO | | BrianAlpert781 -
Proper 301 redirect code for http to https
I see lots of suggestions on the web for forwarding http to https. I've got several existing sites that want to take advantage of the SSL boost for SEO (however slight) and I don't want to lose SEO placements in the process. I can force all pages to be viewed through the SSL - that's no problem. But for SEO reasons, do I need to do a 301 redirect line of code for every page in the site to the new "https" version? Or is there a way to catch all with one line of code that Google, etc. will recognize & honor?
Technical SEO | | wcksmith10 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Why do some URLs for a specific client have "/index.shtml"?
Reviewing our client's URLs for a 301 redirect strategy, we have noticed that many URLs have "/index.shtml." The part we don'd understand is these URLs aren't the homepage and they have multiple folders followed by "/index.shtml" Does anyone happen to know why this may be occurring? Is there any SEO value in keeping the "/index.shtml" in the URL?
Technical SEO | | FranFerrara0 -
Can you noindex a page, but still index an image on that page?
If a blog is centered around visual images, and we have specific pages with high quality content that we plan to index and drive our traffic, but we have many pages with our images...what is the best way to go about getting these images indexed? We want to noindex all the pages with just images because they are thin content... Can you noindex,follow a page, but still index the images on that page? Please explain how to go about this concept.....
Technical SEO | | WebServiceConsulting.com0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0