Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Flux in Rankings Or Something More Serious
Hi all, Two weeks ago i noticed that one of our pages which normally ranks in the top 5 of search results dropped out of the top 50 results. I checked to make sure there were no Google penalties and checked to make sure the page was crawlable. Everything seemed fine and after a few hours our page went back into the number one position. I assumed it was a Google Flux. This number one ranking lasted about a week, today I see my page has dropped out of the top 50 yet again and hasn't come back up. again there are no penalties and there doesn't seem to be issues with the page. I'm hoping it comes back up to the top by tomorrow. What could be causing such a big dip multiple times?
Intermediate & Advanced SEO | | znotes0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Removing Parameterized URLs from Google Index
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases. However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical. The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example: A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and B. www.domain1.com/productname.cfm is ranked at #12. (yes, I know that upper case is bad. We fixed that too.) Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern. Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
Intermediate & Advanced SEO | | AMHC0 -
Can internal links from a blog harm the ranking of a page?
Here is the situation: A site was moved from its original domain to its new domain, and at the same time, the external wordpress.com blog was moved to a subdirectory, making it an onsite blog. The two pages that rank the highest on the site have virtually no links from the blog and no external links, while all the other pages are linked extensively from the blog and have backlinks. Their targeted keywords are not so much easier to rank than the other pages for that to be the sole cause. To confuse the matter even more, there was a manual penalty affecting incoming links which was removed last month. The old site, which has many backlinks to the new site, is still in Google's index. The old blog however, has been redirected page by page and is not in Google's index. Most of the blog posts are short 1-paragraph company updates and potentially considered low quality content because of that (?) The common denominator among the two highest ranked pages (I'm talking top 3 in SERP v. page 3 or 4) seems to be either the lack of external backlinks or the lack of internal links from the blog. Could there be an issue with the blog such that internal links from it are detrimental rather than helpful?
Intermediate & Advanced SEO | | kimmiedawn0 -
Google & Bing not indexing a Joomla Site properly....
Can someone explain the following to me please. The background: I launched a new website - new domain with no history. I added the domain to my Bing webmaster tools account, verified the domain and submitted the XML sitemap at the same time. I added the domain to my Google analytics account and link webmaster tools and verified the domain - I was NOT asked to submit the sitemap or anything. The site has only 10 pages. The situation: The site shows up in bing when I search using site:www.domain.com - Pages indexed:- 1 (the home page) The site shows up in google when I search using site:www.domain.com - Pages indexed:- 30 Please note Google found 30 pages - the sitemap and site only has 10 pages - I have found out due to the way the site has been built that there are "hidden" pages i.e. A page displaying half of a page as it is made up using element in Joomla. My questions:- 1. Why does Bing find 1 page and Google find 30 - surely Bing should at least find the 10 pages of the site as it has the sitemap? (I suspect I know the answer but I want other peoples input). 2. Why does Google find these hidden elements - Whats the best way to sort this - controllnig the htaccess or robots.txt OR have the programmer look into how Joomla works more to stop this happening. 3. Any Joomla experts out there had the same experience with "hidden" pages showing when you type site:www.domain.com into Google. I will look forward to your input! 🙂
Intermediate & Advanced SEO | | JohnW-UK0 -
Site rankings dropped from ~15 to 500+ but Google says we were not penalized
2 months ago my site was ranking about 15 for my main KWs in the UK (my main market). On July 28 we dropped suddenly to ~500 or not in the top 500 at all in the UK for the 3 main keywords. However, these same terms are still ranking in other markets and we are doing okay in the UK for other terms targeted to the same pages. I cleaned up my links/content and sent a reconsideration request to Google. I then looked closer at my links and realized Google may be counting my affiliate links as if they were mine, so thought maybe that was the problem. I sent Google another reconsideration request and they wrote back the letter pasted below Any ideas about what happened or how I can get my rankings back? This definitely doesn't seem like it was just a simple algorithm change. There were no major changes done to our site, and we are still ranking in other markets. Dear site owner or webmaster
Intermediate & Advanced SEO | | theLotter
We received a request from a site owner to reconsider your site for compliance with Google's Webmaster Guidelines.
We reviewed your site and found no manual actions by the webspam team that might affect your site's ranking in Google. There's no need to file a reconsideration request for your site, because any ranking issues you may be experiencing are not related to a manual action taken by the webspam team.
Of course, there may be other issues with your site that affect your site's ranking. Google's computers determine the order of our search results using a series of formulas known as algorithms. We make hundreds of changes to our search algorithms each year, and we employ more than 200 different signals when ranking pages. As our algorithms change and as the web (including your site) changes, some fluctuation in ranking can happen as we make updates to present the best results to our users.
If you've experienced a change in ranking which you suspect may be more than a simple algorithm change, there are other things you may want to investigate as possible causes, such as a major change to your site's content, content management system, or server architecture. For example, a site may not rank well if your server stops serving pages to Googlebot, or if you've changed the URLs for a large portion of your site's pages. This article has a list of other potential reasons your site may not be doing well in search.
If you're still unable to resolve your issue, please see our Webmaster Help Forum for support.
Sincerely,
Google Search Quality Team0 -
Nofollow links in Google Webmaster
I've noticed nofollow links showing up in my Google Webmaster tools "links to your site" list. If they are nofollow why are they showing up here? Do nofollow links still count as a backlink and transfer PR and authority?
Intermediate & Advanced SEO | | NoCoGuru1 -
Google.ca vs Google.com Ranking
I have a site I would like to rank high for particular keywords in the Google.ca searches and don't particularly care about the Google.com searches (it's a Canadian service). I have logged into Google Webmaster Tools and targeted Canada. Currently my site is ranking on the third page for my desired keywords on Google.com, but is on the 20th page for Google.ca. Previously this change happened quite quickly -- within 4 weeks -- but it doesn't seem to be taking here (12 weeks out and counting). My optimization seems to be fine since I'm ranking well on Google.com: not sure why it's not translating to Google.ca. Any help or thoughts would be appreciated.
Intermediate & Advanced SEO | | seorm0