Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Problems preventing Wordpress attachment pages from being indexed and from being seen as duplicate content.
-
Hi
According to a Moz Crawl, it looks like the Wordpress attachment pages from all image uploads are being indexed and seen as duplicate content..or..is it the Yoast sitemap causing it? I see 2 options in SEO Yoast:
- Redirect attachment URLs to parent post URL.
- Media...Meta Robots: noindex, follow
I set it to (1) initially which didn't resolve the problem. Then I set it to option (2) so that all images won't be indexed but search engines would still associate those images with their relevant posts and pages.
However, I understand what both of these options (1) and (2) mean, but because I chose option 2, will that mean all of the images on the website won't stand a chance of being indexed in search engines and Google Images etc?
As far as duplicate content goes, search engines can get confused and there are 2 ways for search engines
to reach the correct page content destination. But when eg Google makes the wrong choice a portion of traffic drops off (is lost hence errors) which then leaves the searcher frustrated, and this affects the seo and ranking of the site which worsens with time.My goal here is - I would like all of the web images to be indexed by Google, and for all of the image attachment pages to not be indexed at all (Moz shows the image attachment pages as duplicates and the referring site causing this is the sitemap url which Yoast creates) ; that sitemap url has been submitted to the search engines already and I will resubmit once I can resolve the attachment pages issues..
Please can you advise.
Thanks.
-
Hi Kate,
Here is an update as to what is happening so far. Please excuse the length of this message.
-
The database according to the host is fine (please see below) but WordPress is still calling https:
-
In the WP database wp-actions, http is definitely being called* All certificates are ok and SSL is not active* The WordPress database is returning properly* The WP database mechanics are ok* The WP config-file is not doing https returns, it is calling http correctly
-
They said that the only other possibility could be one of the plugins causing the problem. But how can a plugin cause https problems?...I can see 50 different https pages indexed in Google. Bing has been checked and there are no https pages indexed there. All internal urls always have been http only and that is still the case.
-
I have Google fetched the website pages and in the 50 https pages most are images which I think probably must have came from the Yoast sitemap which was originally submitted to the search engines (more recently though I have taken all media image url's out of the Yoast sitemap and put noindex, follow on all image attachments files (the pages and the images on the pages will still be crawled and indexed in Google and search engines, it just means that any image url's won't. What will happen to those unwanted https files though? If I place rel canonical links on the pages that matter will the https pages drop out of the index eventually? I just wish I could find what is causing it (analogy: best to fix a hole in a roof to stop having to use a bowl to catch the water each time it rains).
-
** I looked at analytics today and saw something really interesting (see attached image) - you can see 5 instances of the trailing slash at the home page and to my knowledge there should only be 1 for a website. The Moz Crawl shows just 1 home domain http://example.co.uk/ so I am somewhat confused. Google search results showed 256 results for https url references, and there were 50 available to click on. So perhaps there are 50 https pages being referenced for each trailing slash (could there be 4 other trailing slash duplicate pages indexed and how would I fix it if that is the case?). This might sound naive but I don't have the skillset to fix this at this time so any help and advice would be appreciated.
-
Would Search and Replace plugin help at all or would it be a waste of time since the WordPress database mechanics seem to be ok.
-
I can't place any https to http 301 redirects for the 50 https url's that are indexed in Google, and I can't add any https rewrite rules in htaccess since that type of redirect will only work if a SSL is active. I already tried several redirect rules in htaccess and as expected they wouldn't work which again would probably mean that the SSL is not active for the site.
-
When https is entered instead of http, there should be an automatic resolve to http without me having to worry about that, but I tried again and the https version with a red diagonal line through it appears instead. The problem is that once a web visitor lands on that page they stay in that land of https (visually the main nav bar contents stretch across the page and the images and videos don't appear), and so the traffic will drop off..so hence a bad experience for the user and dropped traffic, decreasing income and bad for seo (split page juice, decreased rankings). There are no crawl errors in Google Search Console and Analytics shows Google Fetch completed for all pages - but when I request fetch and render for the home page it shows as partial instead of completed.
-
I don't want to request any https url removals through Google and search engines - it's not recommended because Google states that http version could be removed as well as https.
-
I did look at this last week:
http://www.screamingfrog.co.uk/5-easy-steps-to-fix-secure-page-https-duplicate-content/
-
Do you think that the https urls are indexed because of links pointing to the site are using https? Perhaps most of the backlinks are https but the preferred setting in Webmaster Tools / Search Console is already set to the non-www version instead of the www version; there has never been a https version of the site.
-
This was one possibility re duplicate content. Here are two pages and the listed duplicates:
-
The first Moz crawl I ever requested came back with hundreds of duplicate errors and I have resolved this. Google crawl had not picked this up previously (so I figured everything had been ok) and it was only realised after that Moz crawl. So https links were seen to be indexed and so the goals are to stop the root cause of the problem and to fix the damage so that any https url's can drop off out of the serps and the index.
-
I considered that the duplicate links in question might not be considered as true duplicates as such - it is actually just that the duplicate pages (these were page attachments created by WordPress for each image uploaded to the site) have no real content so the template elements outweighed the actual unique content elements which was flagging them as duplicates in the moz tool. So I thought that these were unlikely to hurt as they were not duplicates as such but they were indexed thin content. I did a content audit and tidy tidied things up as much as I could (blank pages and weak ones) hence the new recent sitemap submission and fetch to Google.
-
I have already redirected all attachments to the parent page in Yoast, and removed all attachments from the Yoast sitemap and set all media content (in Yoast) to 'noindex, follow'.
-
Naturally it's really important to eliminate the https problem before external backlinks link back to any of the unwanted https pages that are currently indexed. Luckily I haven't started any backlinking work yet, and any links I have posted in search land have all been http version. As I understand it, most server configurations should redirect by default to http when https isn’t configured, so I am confused as to where to take this especially as the host has given the WP database the all clear.
-
It could be taxonomies related to the theme or a slider plugin as I have learned these past few weeks. Disallowing and deindexing those unwanted http URLs would be amazing since I have so far spent weeks already trying to get to the bottom of the problem.
-
Ideally I understand from previous weeks that these 2 things would be very important:
(1)301 redirects from http to https (the host in this case cannot enable this directly through their servers and I can only add these redirects in the htaccess file if there is an active SSL in place).(2)Have in place a canonical url using http for both the http and https variations. Both of those solutions might work on their own and if the 301 redirect can't work with the host then the canonical will fix it? I saw that I could just set a canonical with a fixed transport protocol of http:// - then Google will then sort out the rest. Not preferred from a crawl perspective but would suffice? (Even so I don't know how to put that in place).
-
There are around 180 W3C validation errors. Would it help matters to get these fixed? Would this help to fix the problem do you know? The homepage renders with critical errors and a couple of warnings.
-
The 907 Theme scores well for its concept and functionality but its SEO reviews aren't that great.
-
Duplicate problems are not related to the W3 Total Cache plugin which is one of the plugins in place.
-
Regarding addons (trailing slash): Example: http://domain.co.uk/events redirects to http://domain.co.uk/events/ the addon must only do it on active urls - even if it didn't there were no reports of / duplicate errors in the Moz Crawl so its a different issue that would need looking at separately I would think.
-
At the bottom of each duplicate page there is an option for noindex. There are page sections and parallax sections that make up the home page, and each has to be published to become a live part of the home page. This isn't great for SEO I understand that because only the top page section is registered in Yoast as being the home page the other sections on the home page are not crawled as part of the home page but are instead separate page sections. Is it ok to index those page sections? If I noindex, follow them would that be good practice here. The theme does not auto block the page section from appearing in search engines.
-
Can noindex only be put on whole pages and not the specific page sections? I just want to make sure that the content on all the pages (media and text) and page sections are crawlable.
-
To ultimately fix the https problem re indexed pages out there could this eventually be a case of having to add SSL to the site just because there is no better way - just so the https to http redirect rule can be added to the htaccess file? If so, I don't think that would fix the root cause of the problem, but the root cause could be one of the plugins? Confused.
-
With Canonical url's does that mean the https links that don't have canonicals will deindex eventually? Are the https links giving a 404 (I'm worried because normally 404's need 301's as you know and I can't put a 301 on a https url in this situation). Do I have to do set a canonical for every single page on the website because of the extent of the problem that has occurred?
-
Nearly all of the traffic is being dropped after visiting the home page, and I can't for the life of me see why. Is it because of all these https pages? Once canonicals are in place how long will it take for everything to return to how it should be? Is it worthwhile starting a ppc campaign or should I wait until everything has calmed down on the site?
-
Is this a case of setting the canonical URL and then the rest will sort itself out? (please see the screenshot attached regarding the 5 home pages that each have a trailing slash).
-
This is the entire current situation. I understand this might not be so straight forward but I would really appreciate help as the site continues to drop traffic and income. Others will be able to learn from this string of questions and responses too. Thank you for reading this far and have a nice day. Kind Regards,
-
-
Hi Paul
I did (1) which did not resolve the problem, so I then set media to noindex. follow
I have already exclude attachment URLs from sitemap
When you say: When adding media, make certain the Link to box does NOT point to the attachment page. Are you saying to edit all the link settings to current images, or do you mean for future image uploads? Or in both cases?
Thanks
-
In order to accomplish your goal, setup Yoast SEO to:
- redirect attachment URLs to parent post
- exclude attachment URLs from sitemap (it's a checkbox under the Post Types tab in the XML Sitemaps section of Yoast SEO Settings)
- leave all media indexed and followed.
- When adding media, make certain the Link to box does NOT point to the attachment page.
What this accomplished is to allow the actual image file to still be indexed and hence show up in Image search. It also ensures that the pointless image attachment pages don't waste crawl budget and don't show up to the search crawlers as thin/dupe content. Win!
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Body of text on category pages
Hello everyone, wonder if I can pick your brains about our company's website. We are a tea company - Canton Tea Co. We have been advised that it is really important to get more text onto the category pages on our website, as otherwise the page just consists of a list of products, and therefore provides Google with a ton of headers, tiny descriptions, and not enough text to allow the page to being easily indexed, therefore hurting our Google ranking for key search terms like 'Green Tea' which should lead to the Green Tea category page. So we decided to add some text to the category page. The only place for this text to go was laid over the category header image. However, it looks pretty awful and unsophisticated having this text on top of the image - please see an example, our Green Tea category page, via this link: http://www.cantonteaco.com/loose-leaf-tea-1/type/green-tea.html So I have three questions: How significant is the text on a category page such as this to that page's Google ranking? If we moved the text to an area that was hidden until clicked on, for example the 'Filter by' section that opens up when you click on it (see via URL above), would that negate the SEO benefit? Do you have any other ideas or opinions on how to resolve this? Thank you! Louise, Canton Tea Co.
Web Design | | Cantonteaco0 -
Wordpress: Pages vs Posts vs Portfolio
Hi All, I'm looking to put pen to paper and design my main structual template for my website. I will be creating the new site in Wordpress. My understanding of Wordpress is broken into the Static Pages, Posts and Portfolio. Static PAGE
Web Design | | Mark_Ch
Static one off content.
No tags, categories or archived Posts
content entries, which is listed in reverse chronological order.
Update post entry to maintain overall freshness of your website.
tags, categories & archived Portfolio
????? Question What are the benefits of a portfolio page over Static Pages & Posts When creating feature rich articles should i use static pages, posts or portfolio. Thanks Mark0 -
Old site to new WordPress site - Client concerned about Yahoo Ranking
Hello, Back Story I have a client (law firm) who has a large .html website. He has been doing his own SEO for years and it shows. I think the only reason he reached out to a professional is because he got a huge penalty from Google last fall and fell very far down in rankings. Although, he still retains a #1 spot in Yahoo for his site for the keyword phrase he wants. I have been creating a new WordPress theme for the client and creating all new pages and updating the formatting/SEO. From the beginning I have told the client that when we delete the old site and install a new WordPress site (same domain name, but different page hierarchy) he will take a bump in the search engines until all the 301 redirects get sorted out. I told him I can't guarantee any time frame of how long the dip in SEO will last. Some sites bounce right back while others take longer. Last week, during a discussion, he tells me that if he loses his #1 ranking on Yahoo for any length of time he thinks he will go out of business. Needless to say I was a little taken back. When it comes to SEO I use best practice techniques, do my research, stay on top of trends but I never guarantee rankings when moving to a new site. I'm thinking of ways I can help elevate any type of huge SEO drop off and help the client. Here is what I was thinking of suggesting to the client and I would love some feedback. Main Question He has another domain he isn't doing anything with. It's pretty much his domain name with pc added. I was thinking about using that domain to create a simple 1-2 page WordPress website with brand new content (no duplicate content) aimed at attracting his keyword phrase. I would do as much SEO as I could with a 1-2 page site and give it a month or so to see if this smaller site can get into the top #10 in Yahoo, or higher. Then, when we move the site he will still have a website on the first page of Yahoo for his keyword phrase. I hope I explained it clearly 🙂 I would be open to any suggestions anyone may have. Thanks
Web Design | | Bill_K0 -
Too Many Outbound Links on the Home Page - Bad for SEO?
Hello Again Moz community, This is my last Q of the day: I have a LOT of outbound links on the home page of www.web3.ca Some are to clients projects, most are to other pages on the website. Can reducing this to the core pages have a positive impact on SEO? Thanks, Anton
Web Design | | Web3Marketing870 -
One Page Guide vs. Multiple Individual Pages
Howdy, Mozzers! I am having a battle with my inner-self regarding how to structure a resources section for our website. We're building out several pieces of content that are meant to be educational for our clients and I'm having trouble deciding how to layout the content structure. We could either layout all eight short sections on a single page, or create individual pages for each section. The goal is obviously to attract new potential clients by targeting these terms that they may be searching for in an information gathering stage. Here's my dilemma...
Web Design | | jpretz
With the single page guide, it would be nice because it will have a lot of content (and of course, keywords) to be picked up by the SERPS but I worry that it is going to be a bit crammed (because of eight sections) for the user. The individual pages would be much better organized and you can target more specific keywords, but I worry that it may get flagged for light content as some pages may have as little as a 150 word description. I have always been mindful of writing copy for searchers over spiders, but now I'm at a more technical crossroads as far as potentially getting dinged for not having robust content on each page. Here's where you come in...
What do you think is the better of the two options? I like the idea of having the multiple pages because of the ability to hone-in on a keyword and the clean, organized feel, but I worry about the lack of content (and possibly losing out on long-tail opportunities). I'd love to hear your thoughts. Please and thank you. Ready annnnnnnnnnnnd GO!0 -
Two home pages?
One of my campaigns shows duplicate page content for domain xxx and xxx/index. There is only one index (home) page, so why does it report on two?
Web Design | | Beemer0 -
Best Wordpress Hosting
I've had a horrible experience with the security on wordpress hosting with GoDaddy. Someone recommended Blue Host as my next option. Does anyone have any experience with BlueHost and what other hosting companies would you recommend for wordpress hosting?
Web Design | | ChristineCadena0 -
Does using Wordpress Multisite have any negative SEO impact?
I manage multiple websites in Wordpress and the idea of managing them all under one Wordpress install is very attractive. Are there any dangers SEO-wise to doing so? I know that all of the sites would live under the same IP address, but that's not something I'm really concerned with anyway because I don't do a lot of inter-linking between the sites. Thanks for your help! -El Juano
Web Design | | JonathanFashbaugh0