Can Anybody Understand This ?
-
Hey guyz,
These days I'm reading the paperwork from sergey brin and larry which is the first paper of Google.
And I dont get the Ranking part which is:"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
-
I can't say I have a complete understanding of what this is explaining, but here's a link to the original paper on Stanford's website if anyone else is interested. http://infolab.stanford.edu/~backrub/google.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I hope someone can help me with page indexing problem
I have a problem with all video pages on www.tadibrothers.com.
Technical SEO | | TadiBrothers
I can not understand why google do not index all the video pages?
I never blocked them with the robots.txt file, there are no noindex/nofollow tags on the pages. The only video page that I found in search results is the main video category page: https://www.tadibrothers.com/videos and 1 video page out of 150 videos: https://www.tadibrothers.com/video/front-side-rear-view-cameras-for-backup-camera-systems I hope someone can point me to the right way0 -
We just can't figure out the right anchor text to use
We have been trying everything we can with anchor text. We have read here that we should try naturalistic language. Our competitors who are above us in Google search results don't do any of this. They only use their names or a single term like "austin web design". Is what we are doing hurting our listings? We don't have any black hat links. Here's what we are doing now. We are going crazy trying to figure this out. We are afraid to do anything in fear it will damage our position. Bob | pallasart web design | 31 | 1,730 |
Technical SEO | | pallasart
| website by pallasart a texas web design company in austin | 15 | 1,526 |
| website by the austin design company pallasart | 14 | 1,525 |
| created by pallasart a web design company in austin texas | 13 | 1,528 |
| created by an austin web design company pallasart | 12 | 1,499 |
| website by pallasart web design an austin web design company | 12 | 1,389 |
| website by pallasart an austin web design company | 11 | 1,463 |
| pallasart austin web design | 9 | 2,717 |
| website created by pallasart a web design company in austin texas | 9 | 1,369 |
| website by pallasart | 8 | 910 |
| austin web design | 5 | 63 |
| pallasart website design austin |0 -
Can Google Read schema.org markup within Ajax?
Hi All, as a local business directory, we also display Openinghours on a business listing page. ex. http://www.goudengids.be/napoli-kontich-2550/
Technical SEO | | TruvoDirectories
At the same time I also have schema.org markup for Openinghours implemented.
But, for technical reasons (performance), the openinghours (and the markup alongside) are displayed using AJAX. I'm wondering if google is able to read the markup. The rich snippet tool and markup plugings like Semantic Inspector can't "see" the markup for openinghours. Any advice here?0 -
How can you tell if a 301 has been set up correctly?
How can you tell if a 301 has been set up correctly. We had a .co.uk site that has moved over to a .com However the .co.uk we have hasd problems with the old webmaster giving control, but he has said he has implemented a 301 redirect to the new .com site, and if you click on .co.uk it does transfer you to the .com site How can we check that it is a 301 set up and not another sort of redirect?
Technical SEO | | ocelot0 -
Can I format my H1 to be smaller than H2's and H3's on the same page?
I would like to create a web design with 12px H1 and for sub headings on the page to be more like 24px. Will search engines see this and dislike it? The reason for doing it is that I want to put a generic page title in the banner, and more poetic headings above the main body. Example: Small H1: Wholesale coffee, online coffee shop and London roastery Large h2: Respect the bean... Thanks
Technical SEO | | Crumpled_Dog
Scott0 -
Canonical - how can you tell if page is appearing duplicate in Google?
Our home page file is www.ides.com/default.asp and appears in Google as www.ides.com. Would it be a good thing for us to include the following tag in the head section of our website homepage?
Technical SEO | | Prospector-Plastics0 -
How can i redirect a url that has % in it?
Google webmaster tools shows a 400 eroor for an old link that contains a 30% off in it. The problem is the % I would like to 301 redirect this link : http://www.geographics.com/Graduation-Stationery,-35%-OFF-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html to http://www.geographics.com/Graduation-Stationery-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html We do not know how to do this in httaccess. Can you please advise? Thanks a lot! Madlena
Technical SEO | | Madlena0 -
I just found something weird I can't explain, so maybe you guys can help me out.
I just found something weird I can't explain, so maybe you guys can help me out. In Google http://www.google.nl/#hl=nl&q=internet. The number 3 result is a big telecom provider in the Netherland called Ziggo. The ranking URL is https://www.ziggo.nl/producten/internet/. However if you click on it you'll be directed to https://www.ziggo.nl/#producten/internet/ HttpFox in FF however is not showing any redirects. Just a 200 status code. The URL https://www.ziggo.nl/#producten/internet/ contains a hash, so the canonical URL should be https://www.ziggo.nl/. I can understand that. But why is Google showing the title and description of https://www.ziggo.nl/producten/internet/, when the canonical URL clearly is https://www.ziggo.nl/? Can anyone confirm my guess that Google is using the bulk SEO value (link juice/authority) of the homepage at https://www.ziggo.nl/ because of the hash, but it's using the relevant content of https://www.ziggo.nl/producten/internet/ resulting in a top position for the keyword "internet".
Technical SEO | | NEWCRAFT0