Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing
Hi We have roughly 8500 pages in our website. Google had indexed almost 6000 of them, but now suddenly I see that the pages indexed has gone to 45. Any possible explanations why this might be happening and what can be done for it. Thanks, Priyam
Intermediate & Advanced SEO | | kh-priyam0 -
Website Ranks and gets de indexed ??
Hi My website is almost 3-4 months old . Whats strange is that as soon as it get Crawled it ranks for few terms for 1-2 days and all of a sudden gets de Indexed for these same terms or Rank drops like drops from page 5 to page 10 . Nothing shows up in Webmater tools under Manual Action . Assuming its a Algorithmic penalty, How to deal with this kind of stuff. Should I stop working on this site all together ? Or assuming its a New website, google does not want it to rank for medium or high volume keywords ? What keywords I am after have 300 -2k searches per month .
Intermediate & Advanced SEO | | aus00070 -
How to reverse declining Google rankings?
We have a long established business since 2004 and have been fortunate that having been one of the original companies in our industry, we have always enjoyed strong Google rankings. Unfortunately, these have been steadily declining over the past couple of years and a comparison of August to date against the equivalent period last year has seen a 20% drop in traffic from Google. We don't believe that it is being caused by a penalty and rather is the result of some strong players entering our market and tightening their focus which has caused us to take a dip in rankings. We are guilty of being complacent in our SEO - largely due to not knowing what to do and being scared to touch it when it was working in case we broke it! - but now it's time to fight back. We still have a strong site, good traffic levels and a strong product offering. We have knowledge of SEO and resources in house, but are not experts by any means. Our current plan is to: perform a technical site audit, fixing the issues highlighted by the Moz Pro Software put strong emphasis on our blog, writing daily about the latest news and events in our industry provide weekly content articles which are more in depth than the daily blog articles and which will be of interest to our community undertake surveys and publish infographics and statistics with the hope of being picked up in national newspapers Are there any key elements that we are missing out in this plan, or is that it in a nutshell? Any help and advice is greatly appreciated.
Intermediate & Advanced SEO | | simonukss0 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
PDF Cached by Google, but not showing as link
The following pdf is cached by google: http://www.sba.gov/sites/default/files/files/REFERRAL%20LIST%20OF%20BOND%20AGENCIES_Florida.pdf However, OpenSiteExplorer is not listing any of the links as found in it. With such an authoritative site, I would think Google would value this, right? None of the sites listed rank well though and OpenSiteExplorer's inability to see the links makes me wonder if Google provides these sites any value at all. Is there any link juice or brand mention value here for Google?
Intermediate & Advanced SEO | | TheDude0 -
Google Manual Penalty - Unnatural Links
Hi, We are in the process of trying to remove a partial manual penalty for unnatural links. I would like to do a complete link audit of our site, where can I get complete data on sites linking to my website? Webmaster tools only appears to show the top 1000 domains. Thanks
Intermediate & Advanced SEO | | halloranc0 -
Site rankings steadily decreasing - do I need to remove links?
Since mid-April, our ranking have been steadily declining. Our two main keywords are 'nuts and bolts' and 'bolts and nuts'. 'nuts and bolts' dropped from 7th to 46th in May and has recovered slightly to 28th, and 'bolts and nuts' moved from 7th to 16th, and is today 24th. Ranking on keywords we specialise in have fared better, but they're fairly niche. 'bsw bolts' has moved from 2nd to 4th, and 'imperial bolts' has moved from 1st to 4th. I think my link profile is the issue. I don't think we've been penalised by Penguin directly (I may be wrong, I don't think we'd be page 2 on such a competitive term as 'bolts and nuts' after Penguin if we had been penalised.), but I think what's happened is that sites that link to us have been penalised, resulting in a knock on effect. Does that sound right? Here's my link profile: <a rel="nofollow" target="_blank">http://www.opensiteexplorer.org/links?site=www.thomassmithfasteners.com</a> I've been slowly building relevant links with prospective customers and kept up a very basic social media profile - just the odd blog post and sharing on Facebook and Twitter. Do I need to delete all the directory links? We do have links from directories that don't look fantastic, more are shown in Webmaster Tools than are listed here. Some of the directories no longer seem to exist, I take it I don't need to do anything and Google will catch up in those cases. Should I attempt to remove (or disavow) all links with names like best-directory etc? Or should I just concentrate on building better links? I'm not sure where to start! Any advice is greatly appreciated. Best Regards, Stephen
Intermediate & Advanced SEO | | stephenshone0 -
De-indexed by Google! ?
So it looks as though the content from myprgenie.com is no longer being indexed. Anyone know what happened and what they can do to fix it fast?
Intermediate & Advanced SEO | | siteoptimized0