Google News not indexing .index.html pages
-
Hi all,
we've been asked by a blog to help them better indexing and ranking on Google News (with the site being already included in Google News with poor results)
The blog had a chronicle URL duplication problem with each post existing with 3 different URLs:
#1) www.domain.com/post.html (currently in noindex for editorial choices as showing all the comments)
#2) www.domain.com/post/index.html (currently indexed showing only top comments)
#3) www.domain.com/post/ (very same as #2)
We've chosen URL #2 (/index.html) as canonical URL, and included a rel=canonical tag on URL #3 (/) linking to URL #2.
Also we've submitted yesterday a Google News sitemap including consistently the list of URLs #2 from the last 48h . The sitemap has been properly "digested" by Google and shows that all URLs have been sent and indexed.However if we use the site:domain.com command on Google News we see something completely different: Google News has indexed actually only some news and more specifically only the URLs #3 type (ending with the trailing slash instead of /index.html). Why ? What's wrong ?
a) Does Google News bot have problems indexing URLs ending with .index.html ? While figuring out what's wrong we've found out that http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html gives no results...it seems that Google News index overall does not include any URLs ending with /index.html
b) Does Google News bot recognise rel=canonical tag ?
c) Is it just a matter of time and then Google News will pick up the right URLs (/index.html) and/or shall we communicate Google News team any changes ?
d) Any suggestions ? OR Shall we do the other way around. meaning make URL #3 the canonical one ?
While Google News is showing these problems, Google Web search has actually well received the changes, so we don't know what to do.
Thanks for your help,
Matteo
-
To follow up on this.
Look what I've found in the Google News Forum:
http://www.google.com/support/forum/p/news/thread?tid=248ef4e6fe372e91&hl=en
The problem is almost the same. Google News not indexing URLs with the trailing index.html.
The only person who answered was a Top Contributor suggesting to contact directly Google News team.
-
Hmmm, that is strange! Check a cached version of one of your URLs to make sure they new version is in the index. If it is, maybe you should switch to option 3.
I am not sure what if any the implications would be of leaving it the way you have it.
Since it is in 2 different areas of search I am not sure that duplicate content issues apply if you were to just leave it be.
-
hey Roger,
Look the CNN seems to have exactly the same "problem" as we do.
They have the "/" article indexed in google news and the index.html version on the non-google news index. They did exavtly what we did, putting a rel=canonical on the "/" version to the "index.html" one. Despite this the "/" version is still the only one showing up on google news
Here is the screenshot just in case
and here the two versions of the same article:
- http://edition.cnn.com/2011/POLITICS/04/22/obama.campaign/
- http://edition.cnn.com/2011/POLITICS/04/22/obama.campaign/index.html
-
They seem to meet these requirements. The only one that is a problem is requirement #3, but it clearly states that is waived with News sitemaps which Matteo said they submitted.
With that said I do like Matteo's option #1 better than the naming convention they chose to go with.
-
It does sound weird, but I am not sure that search operator works in Google News.
Here is a simple test. Search Google News for "Google"
The second story I see is http://phandroid.com/2011/04/22/will-spotify-be-google-musics-savior/
However a Google News search for "inurl:will-spotify-be-google-musics-savior" returns no results.
Clearly the story is indexed!
-
My hunch, and it's only a hunch, is that it relates to their URL requirements that the URL has to be dedicate to an article. An index.html page is usually not a page that would be dedicated to one individual news story. See http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=68323 for their URL requirements.
-
Hi roger and thx for the very insightful answer !
what about the fact that not a single URL ending with index.html is indexed in Google News ?
http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html
compare that with the normal google index
http://www.google.it/search?q=inurl%3Aindex.html&hl=en&ned=us&tab=nw
doesn't that sound weird to you ?
matteo
-
I had another thought too. Just because the pages say they are indexed in Google WMT, doesn't mean the new content including the new canonical tags have been crawled or added to the index yet.
I recently did a similar project adding canonical tags to an ecommerce site. The new URLs are only showing up correctly in the search results maybe 10% of the time, even for pages I know have been crawled and I submitted a week ago. The important thing is that more URLs are updated each day.
I dont believe they throw out their index the first time they crawl an established page and something has changed. I believe the index gets changed as they continue to crawl they compare versions and index data based on multiple crawl agregates, especially if it is for existing pages that have been in the index for a while. So in other words, if they compare 20 recent crawls and only see 1 version as being different, they may not throw out the old version right away until they crawl it multiple times and see that the the new version exists, say 5 or 10 of the most recent 20 crawls. BTW I don't have any data to back that up just my personal observation/theory.
-
If you used the rel canonical tag properly and only submitted sitemap yesterday, its just a waiting game. You will get crawled and indexed properly soon.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Pages in my Shopify website is not indexing
Hi The Service area pages created on my Shopify website is not indexing on google for a long time, Tried indexing the pages manually and also submitted the sitemap but still the pages doesn't seem to get indexed.
Technical SEO | | Bhisshaun
Thanks in Advance.0 -
My Website stopped being in the Google Index
Hi there, So My website is two weeks old, and I published it and it was ranking at about page 10 or 11 for a week maybe a bit longer. The last few days it dropped off the rankings, which I assumed was the google algorithm doing its thing but when I checked Google Search Console it says my domain is not in the index. 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' I click request indexing, then after a bit, it goes green saying it was successfully indexed. Then when I refresh the website it gives me the same message 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' Not sure why it says this, any ideas or help is appreciated cheers.
Technical SEO | | sydneygardening0 -
Drop in Indexed Page + Organic Traffic
Hey Moz Community, I've been seeing a steady decrease in search console of pages being indexed by Google for our eCommerce site. This is corresponding to lower impressions and traffic in general this year. We started with around a million pages being indexed in Nov of 2015 down to 18,000 pages this Nov. I realized that since we don't have around 3,000 or so products year round this is mostly likely a good thing. I've checked to make sure our main landing pages are being indexed which they are and our sitemap was updated several times this year, although we're in the process of updating it again to resubmit. I also checked our robots.txt and there's nothing out of the ordinary. In the last month we've recently gotten rid of some duplicate content issues caused by pagination by using canonical tags but that's all we've done to reduce the number of pages crawled. We have seen some soft 404's and some server errors coming up in our crawl error report that we've either fixed or are trying to fix. Not really sure where to start looking to find a solution to the problem or if it's even a huge issue, but the drop in traffic is also not great. The drop in traffic corresponded to lose in rankings as well so there could be correlation or none. Any ideas here?
Technical SEO | | znotes0 -
Wrong page title in Google
Hi there, A while ago we took over the domain www.hoesjes.nl and forwarded it to our website www.telefoonhoesjesxl.nl. If you perform a search for the keyword 'hoesjes' in Google then we (www.telefoonhoesjesxl.nl) show up on an organic number 1 position. The problem is that the page title isn't correct. Google shows the page title of the website hoesjes.nl we took over and (correctly?) redirected to our domain www.telefoonhoesjesxl.nl. Does anybody have any idea how to get rid of this wrong page title in Google?
Technical SEO | | MarcelMoz
Here you can find a screenshot of what I mean. Thanks! Marcel0 -
Why is Google not indexing my site?
I'm a bit confused as to why my site just isn't indexing on Google. Even if I type in my brand name, my social channels rank and there's no evidence of my website. I've followed all of the advice I've read and gone into webmaster tools and got the Wordpress yoast plug-in but nothing seems to be making a difference!One thing I've noticed, in Google Webmaster Tools it says "Couldn’t communicate with the DNS server." in site errors. I've called GoDaddy and they said that everything is fine. A bit frustrating. Trying to work out what my next steps should be but feeling a bit lost to be honest! Any help GREATLY appreciated!
Technical SEO | | j1066s0 -
Google+ Contibutor to: Link To Main Domain or Content Page?
Which is the best practice for the link to claim authorship for a guest post? I have tried both the main domain URL in the "contributor to" section of my Google plus and the page URL where the post is and both show my picture when testing in the Structured Data Testing Tool. Which is best to use? Thanks in advance.
Technical SEO | | WSIDW0 -
YouTube & Google + Pages
Hello, Has anyone had luck associating their Google + business page with their YouTube channel? Our YouTube page is associated with our Google + profile (and we would like it to be associated with the Google + business page.) There are numerous articles out there that Google is working on an update to allow the Channel/Google+ business page association but I am wondering if there is news we might have missed. Or if there is a way to get around it? We want to implement video on some site pages and would rather use YouTube code as opposed to customizing a solution. Do most folks think Google will have an easy solution once it at arrives? Meaning if you upload videos to your channel that is currently associated with the profile page, do you think there will be a way to convert everything over to a Google + business page once they unveil an update. Thank you!
Technical SEO | | SEOSponge0 -
How do I get google to index the right pages with the right key word?
Hello I notice that even though I have a site map google is indexing the wrong pages under the wrong key words. As a result its not as relevant and is not ranking properly.
Technical SEO | | ursalesguru0