Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | | Inevo0 -
Our web site lost ranking on google a couple of years ago. We have done lots of work on it but still can not improve our search ranking. Can anyone give us some advise
A couple of years ago the ranking on our site dropped over night. I believe someone working here at the time purchased links about that time. We have been doing lots of work on the site since then to improve it. We can not get our rankings back up on google searches. Can anyone give us some advise about what to do or where to go for some help that we can trust.
Intermediate & Advanced SEO | | CostumeD0 -
Can Google read content/see links on subscription sites?
If an article is published on The Times (for example), can Google by-pass the subscription sign-in to read the content and index the links in the article? Example: http://www.thetimes.co.uk/tto/life/property/overseas/article4245346.ece In the above article there is a link to the resort's website but you can't see this unless you subscribe. I checked the source code of the page with the subscription prompt present and the link isn't there. Is there a way that these sites deal with search engines differently to other user agents to allow the content to be crawled and indexed?
Intermediate & Advanced SEO | | CustardOnlineMarketing0 -
Apps content Google indexation ?
I read some months back that Google was indexing the apps content to display it into its SERP. Does anyone got any update on this recently ? I'll be very interesting to know more on it 🙂
Intermediate & Advanced SEO | | JoomGeek0 -
Large Number of Links appearing in Google Webmaster Tools
Hello, In the last week we have noticed an extremely large number of backlink links appearing in Google Webmaster Tools. One of the sites which links to us now have over 101,000 backlinks pointing to us, when in reality it should only have 300-600. We have check the websites have not been hacked, with hidden links etc, but we can not find any. Has anyone else experienced problems with Google webmaster tools lately, displaying way too many links? Or could this be a negative SEO attack, which is yet to emerge. Thanks Rob
Intermediate & Advanced SEO | | tomfifteen0 -
Google Indexed Old Backups Help!
I have the bad habit of renaming a html page sitting on my server, before uploading a new version. I usually do this after a major change. So after the upload, on my server would be "product.html" as well as "product050714".html. I just stumbled on the fact G has been indexing these backups. Can I just delete them and produce a 404?
Intermediate & Advanced SEO | | alrockn0 -
Are links to our website through our affiliate program hurting our rankings?
We have an affiliate program for an educational related course product and I am becoming worried that links to us on our affiliate's websites are hurting our site rankings. I have read that google is usually pretty good about picking up on affiliate links and not giving the follow links credit, but not sure if that is just for the big affiliate networks or if they can spot less obvious affiliate programs. With this in mind, would you ask all affiliates to use the nofollow tag on all links coming in, or would you make sure that the links are more branded in nature? There are a mix of text links along with banners and other display components. There would be editing that would need to be done to the core files of our affiliate/member software (aMember Pro) to make all links nofollow and we want to see if there are other recommendations before doing so. We are trying to fight out way out of what we believe is an over-optimized anchor text penalty and are evaluating all areas that we can make improvements. Any advice is greatly appreciated!
Intermediate & Advanced SEO | | youngb550 -
Why isn't link velocity in the 2011 Ranking Factors?
How come there's no reference to link velocity in the Search Ranking Factors, 2011 or prior? We know that we have to continue building links for a client even if they're already doing well, not just because of the competition nipping at their heels but because if we stop they slip down anyway, so we know that stopping link building will often times have an adverse effect... meaning link velocity right? So how come there's no mention of it? Just curious 🙂
Intermediate & Advanced SEO | | SteveOllington0