Can Anybody Understand This ?
-
Hey guyz,
These days I'm reading the paperwork from sergey brin and larry which is the first paper of Google.
And I dont get the Ranking part which is:"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
-
I can't say I have a complete understanding of what this is explaining, but here's a link to the original paper on Stanford's website if anyone else is interested. http://infolab.stanford.edu/~backrub/google.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can Google index the text content in a PDF?
I really really thought the answer was always no. There's plenty of other things you can do to improve search visibility for a PDF, but I thought the nature of the file type made the content itself not-parsable by search engine crawlers... But now, my client's competitor is ranking for my client's brand name with a PDF that contains comparison content. Thing is, my client's brand isn't in the title, the alt-text, the url... it's only in the actual text of the PDF. Did I miss a major update? Did I always have this wrong?
Technical SEO | | LindsayDayton0 -
Videos in Youtube can we add Advertisements?
I have few hundred videos in Youtube and i have a new order to advertise a real estate company. Can i insert the 1 min AD into my video and still be able to Monetize them in Youtube?
Technical SEO | | bsharath0 -
Can Googlebot crawl the content on this page?
Hi all, I've read the posts in Google about Ajax and javascript (https://support.google.com/webmasters/answer/174992?hl=en) and also this post: http://moz.com/ugc/can-google-really-access-content-in-javascript-really. I am trying to evaluate if the content on this page, http://www.vwarcher.com/CustomerReviews, is crawlable by Googlebot? It appears not to be. I perused the sitemap and don't see any ugly Ajax URLs included as Google suggests doing. Also, the page is definitely indexed, but appears the content is only indexed via its original source (Yahoo!, Citysearch, Google+, etc.). I understand why they are using this dynamic content, because it looks nice to an end-user and requires little to no maintenance. But, is it providing them any SEO benefit? It appears to me that it would be far better to take these reviews and simply build them into HTML. Thoughts?
Technical SEO | | danatanseo0 -
Can you redirect from a 410 server error? I see many 410s that should be directed to an existing page.
We have 150,000 410 server errors. Many of them should be redirected to an existing url. This is a result of a complete website redesign, including new navigation and new web platform. I believe IT may have inadvertently marked many 404s as 410s. Can I fix this or is a 410 error permanent? Thank you for your help.
Technical SEO | | sxsoule0 -
Understanding Duplicate Titles in Wordpress
I have duplicate title errors in Wordpress and I cannot pinpoint the problem. I have my blog set up so that the home page of the blog has the most recent posts. In my campaign report somehow the page directory is being found and I can't find any links on my blog to those pages. the errors are on pages that are like the following www.example.com/blog/page/13/ www.example.com/blog/page/14/ I am using Yoast and I thought I had it set up correctly. The other pages have the correct title and canonical tags, but he urls ending with page do not, but the page directory is duplicating the home page title. How or where can i fix this issue?
Technical SEO | | hfranz0 -
Rel canonical = can it hurt your SEO
I have a site that has been developed to default to the non-www version. However each page has a rel canonical to the non-www version too. Could having this in place on all pages hurt the site in terms of search engines? thanks Steve
Technical SEO | | stevecounsell0 -
On a dedicated server with multiple IP addresses, how can one address group be slow/time out and all other IP addresses OK?
We utilize a dedicated server to host roughly 60 sites on. The server is with a company that utilizes a lady who drives race cars.... About 4 months ago we realized we had a group of sites down thanks to monitoring alerts and checked it out. All were on the same IP address and the sites on the other IP address were still up and functioning well. When we contacted the support at first we were stonewalled, but eventually they said there was a problem and it was resolved within about 2 hours. Up until recently we had no problems. As a part of our ongoing SEO we check page load speed for our clients. A few days ago a client who has their site hosted by the same company was running very slow (about 8 seconds to load without cache). We ran every check we could and could not find a reason on our end. The client called the host and were told they needed to be on some other type of server (with the host) at a fee increase of roughly $10 per month. Yesterday, we noticed one group of sites on our server was down and, again, it was one IP address with about 8 sites on it. On chat with support, they kept saying it was our ISP. (We speed tested on multiple computers and were 22MB down and 9MB up +/-2MB). We ran a trace on the IP address and it went through without a problem on three occassions over about ten minutes. After about 30 minutes the sites were back up. Here's the twist: we had a couple of people in the building who were on other ISP's try and the sites came up and loaded on their machines. Does anyone have any idea as to what the issue is?
Technical SEO | | RobertFisher0 -
We have a ton of legacy links that include /?ref=tracking-goes-here. We need concile this, can the conical tag be used to fix this? How?
www.firehost.com/?ref=pressrelease example - http://cl.ly/2O1d1x2m3b1b3K1K0h2J
Technical SEO | | FirePowered0