Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Frequently Indexing - Good or Bad?
Hi, My website is only 4 months old and receives about 40 to 50 organic visits every day. It currently has about 100 pages out of which only 3-4 rank in the top 10 for the target KWs. I usually try to publish, at least 1 article a day but sometimes certain articles are more than 2000 words long with a few of infographics and hence takes way more time (maybe even 3 days to publish one) Only over the last week, I am observing that every time i am publishing a page (usually daily) google is indexing them the same day. This I have heard happens for moderately big sites but my site is really small at this stage. Note: For the first 80 pages, I used to "fetch as googlebot" in webmasters as otherwise my site would be crawled once in 2 weeks but over the last 3-4 weeks, i rely on googles scheduled visits. Is this a good or bad sign? I would like to assume its good because of my engagement. Though for only organic visits, my Gogle Analytics bounce rate is 65% in analytics out of the remaining 35%, the avg time on site >7 mins. That means if someone sticks to my site, they consume a lot of my content. Also, since analytics' bounce rate is not same as the search bounce (back button) I would like to consider that the bounce is actually lesser than that.
Intermediate & Advanced SEO | | dwautism0 -
Do internal links from non-indexed pages matter?
Hi everybody! Here's my question. After a site migration, a client has seen a big drop in rankings. We're trying to narrow down the issue. It seems that they have lost around 15,000 links following the switch, but these came from pages that were blocked in the robots.txt file. I was wondering if there was any research that has been done on the impact of internal links from no-indexed pages. Would be great to hear your thoughts! Sam
Intermediate & Advanced SEO | | Blink-SEO0 -
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
Intermediate & Advanced SEO | | friendoffood0 -
Google Is Indexing My Internal Search Results - What should i do?
Hello, We are using a CMS/E-Commerce platform which isn't really built with SEO in mind, this has led us to the following problem.... a large number of internal (product search) search result pages, which aren't "search engine friendly" or "user friendly", are being indexed by google and are driving traffic to the site, generating our client revenue. We want to remove these pages and stop them from being indexed, replacing them with static category pages - essentially moving the traffic from the search results to static pages. We feel this is necessary as our current situation is a short-term (accidental) win and later down the line as more pages become indexed we don't want to incur a penalty . We're hesitant to do a blanket de-indexation of all ?search results pages because we would lose revenue and traffic in the short term, while trying to improve the rankings of our optimised static pages. The idea is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages. Our main focus is to improve user experience and not have customers enter the site through unexpected pages. All thoughts or recommendations are welcome. Thanks
Intermediate & Advanced SEO | | iThinkMedia0 -
Branded Links : But not got any ranking..
Hi there. I'm SEO expert myself. I am building quality and authority backlinks with branded anchor text. My website has now over 1.5k backlinks with branded anchor text and generic keywords. But my website still not showing in SERPS with my targeted niche or anchor text. Do i need to build backlinks with exact match anchor text if yes then how much? Thanks in advance.
Intermediate & Advanced SEO | | globalitsoft0 -
.GOV Link - same impact on my site's rankings whether link to home or Gov related category?
I own a job site and I am about to get a link from a .GOV. My site has a category called "State Jobs". Should I ask the ".Gov" to link to my homepage or to the state job page and use the anchor text "State Jobs". I understand "State Jobs" page would get a big kick by that being the anchor text and linking to that specific page, but the question I have is this: for my site as a whole (homepage and various categories) would they get around the same "push up" whether the linking is to 1) my homepage with anchor text being my site's name or 2) to the state job specific page and in this case the anchor text would be "State Jobs"? thank you
Intermediate & Advanced SEO | | knielsen0 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0 -
Google indexing flash content
Hi Would googles indexing of flash content count towards page content? for example I have over 7000 flash files, with 1 unique flash file per page followed by a short 2 paragraph snippet, would google count the flash as content towards the overall page? Because at the moment I've x-tagged the roberts with noindex, nofollow and no archive to prevent them from appearing in the search engines. I'm just wondering if the google bot visits and accesses the flash file it'll get the x-tag noindex, nofollow and then stop processing. I think this may be why the panda update also had an effect. thanks
Intermediate & Advanced SEO | | Flapjack0