Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Include or exclude noindex urls in sitemap?
We just added tags to our pages with thin content. Should we include or exclude those urls from our sitemap.xml file? I've read conflicting recommendations.
Technical SEO | | vcj0 -
What would cause a sudden drop in indexed sitemap pages?
I have made no changes to my site for awhile and on 7/14 I had a 20% drop in indexed pages from the sitemap. However my total indexed pages has stayed the same. What would cause that?
Technical SEO | | EcommerceSite0 -
No confirmation page on Google's Disavow links tool?
I've been going through and doing some spring cleaning on some spammy links to my site. I used Google's Disavow links tool, but after I submit my text file, nothing happens. Should I be getting some sort of confirmation page? After I upload my file, I don't get any notifications telling me Google has received my file or anything like that. It just takes me back to this page: http://cl.ly/image/0S320q46321R/Image 2013-04-26 at 11.15.25 AM.png Am I doing something wrong or is this what everyone else is seeing too?
Technical SEO | | shawn810 -
Why are my URL's with a trailing slash still getting indexed even though they are redirected in the .htaccess file?
My .htaccess file is set up to redirect a URL with a trailing / to the URL without the /. However, my SEOmoz crawl diagnostics report is showing both URL's. I took a look at my Google Webmaster account and saw some duplicate META title issues. Same thing, Google Webmaster is showing the URL with the trailing /. My website was live for about 3 days before I added the code to the .htaccess file to remove the trailing /. Is it possible that in those 3 days that both versions were indexed and haven't been removed even though the .htaccess file has been updated?
Technical SEO | | mkhGT0 -
'No Follow' and 'Do Follow' links when using WordPress plugins
Hi all I hope someone can help me out with the following question in regards to 'no follow' and 'do follow' links in combination with WordPress plugins. Some plugins that deal with links i.e. link masking or SEO plugins do give you the option to 'not follow' links. Can someone speak from experience that this does actually work?? It's really quite stupid, but only occurred to me that when using the FireFox add on 'NoDoFollow' as well as looking at the SEOmoz link profile of course, 95% of my links are actually marked as FOLLOW, while the opposite should be the case. For example I mark about 90% of outgoing links as no follow within a link masking plugin. Well, why would WordPress plugins give you the option to mark links as no follow in the first place when they do in fact appear as follow for search engines and SEOmoz? Is this a WordPress thing or whatnot? Maybe they are in fact no follow, and the information supplied by SEO tools comes from the basic HTML structure analysis. I don't know... This really got me worried. Hope someone can shed a light. All the best and many thanks for your answers!
Technical SEO | | Hermski0 -
404's and duplicate content.
I have real estate based websites that add new pages when new listings are added to the market and then deletes pages when the property is sold. My concern is that there are a significant amount of 404's created and the listing pages that are added are going to be the same as others in my market who use the same IDX provider. I can go with a different IDX provider that uses IFrame which doesn't create new pages but I used a IFrame before and my time on site was 3min w/ 2.5 pgs per visit and now it's 7.5 pg/visit with 6+min on the site. The new pages create new content daily so is fresh content and better on site metrics (with the 404's) better or less 404's, no dup content and shorter onsite metrics better? Any thoughts on this issue? Any advice would be appreciated
Technical SEO | | AnthonyLasVegas0 -
How do i properly combine these two schema's from schema.org
So we're redoing our reviews/testimonials page on our website right now and moving over to the schema.org format as described here: http://schema.org/Review But we would like to combine each of our reviews with a location for which it was reviewed using this: http://schema.org/LocalBusiness What i can't wrap my head around would be the correct syntax? is it just the first block and then the next block? or is there a way of putting the actual physical address within the review page itself? So is this the correct way to do a page full of reviews that are reviewing various physical locations? * <div< span="">itemprop="reviews" itemscope itemtype="http://schema.org/Review"></div<>* <span< span="">itemprop="name">Value purchase</span<> -* by <span< span="">itemprop="author">Lucas</span<>,* <meta< span="">itemprop="datePublished" content="2011-03-25">March 25, 2011</meta<>* <div< span="">itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating"></div<>* <meta< span="">itemprop="worstRating" content = "1"/></meta<>* <span< span="">itemprop="ratingValue">4</span<>/* <span< span="">itemprop="bestRating">5</span<>stars* <span< span="">itemprop="description">Great microwave for the price. It is small and</span<>* fits in my apartment. 1. <div< span="">itemscope itemtype="http://schema.org/LocalBusiness"></div<> 2. # <span< span="">itemprop="name">Beachwalk Beachwear & Giftware</span<> 3. <span< span="">itemprop="description"> A superb collection of fine gifts and clothing</span<> 4. to accent your stay in Mexico Beach. 5. <div< span="">itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"></div<> 6. <span< span="">itemprop="streetAddress">3102 Highway 98</span<> 7. <span< span="">itemprop="addressLocality">Mexico Beach</span<>, 8. <span< span="">itemprop="addressRegion">FL</span<> 10. Phone: <span< span="">itemprop="telephone">850-648-4200</span<> <div< span="">itemprop="reviews" itemscope itemtype="http://schema.org/Review"></div<>* <span< span="">itemprop="name">Value purchase</span<> -* by <span< span="">itemprop="author">Lucas</span<>,* <meta< span="">itemprop="datePublished" content="2011-03-25">March 25, 2011</meta<>* <div< span="">itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating"></div<>* <meta< span="">itemprop="worstRating" content = "1"/></meta<>* <span< span="">itemprop="ratingValue">4</span<>/* <span< span="">itemprop="bestRating">5</span<>stars* <span< span="">itemprop="description">Great microwave for the price. It is small and</span<>* fits in my apartment. <div< span="">itemscope itemtype="http://schema.org/LocalBusiness"></div<> <span< span="">itemprop="name">Beachwalk Beachwear & Giftware</span<> <span< span="">itemprop="description"> A superb collection of fine gifts and clothing</span<> to accent your stay in Mexico Beach. <div< span="">itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"></div<> <span< span="">itemprop="streetAddress">3102 Highway 98</span<> <span< span="">itemprop="addressLocality">Mexico Beach</span<>, <span< span="">itemprop="addressRegion">FL</span<> Phone: <span< span="">itemprop="telephone">850-648-4200</span<>
Technical SEO | | adriandg0 -
Different TLD's same content - duplicate content? - And a problem in foreign googles?
Hi, Operating from the Netherlands with customers troughout Europe we have for some countries the same content. In the netherlands and Belgium Dutch is spoken and in Germany and Switserland German is spoken. For these countries the same content is provided. Does Google see this as duplicate content? Could it be possible that a german customer gets the Swiss website as a search result when googling in the German Google? Thank you for your assistance! kind regards, Dennis Overbeek Dennis@acsi.eu
Technical SEO | | SEO_ACSI0