Can Anybody Understand This ?
-
Hey guyz,
These days I'm reading the paperwork from sergey brin and larry which is the first paper of Google.
And I dont get the Ranking part which is:"Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our ranking function so that no particular factor can have too much influence. First, consider the simplest case -- a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.
For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system.
"
-
I can't say I have a complete understanding of what this is explaining, but here's a link to the original paper on Stanford's website if anyone else is interested. http://infolab.stanford.edu/~backrub/google.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Selling same products under separate brands and can't consolidate sites...duplicate content issues?
I have a client selling home goods online and in-store under two different brand names in separate regions of the country. Currently, the websites are completely identical aside from branding. It is unlikely that they would have the capacity to write unique titles and page content for each website (~25,000 pages each), and the business would never consolidate the sites. Would it make sense to use canonical tags pointing to the higher-performing website on category and product pages? This way we could continue to capture branded search to the lesser brand while consolidating authority on the better performing website. What would you do?
Technical SEO | | jluke.fusion0 -
How can I tell Google not to index a portion of a webpage?
I'm working with an ecommerce site that has many product descriptions for various brands that are important to have but are all straight duplicates. I'm looking for some type of tag tht can be implemented to prevent Google from seeing these as duplicates while still allowing the page to rank in the index. I thought I had found it with Googleoff, googleon tag but it appears that this is only used with the google appliance hardware.
Technical SEO | | bradwayland0 -
Can someone that had (or seen) a Manual Action in WMTs tell me.....
This is a repost of this question http://moz.com/community/q/manual-action-found-in-wmts-no-email-no-message-in-wmts But I'm sure there is someone in the moz fourms that have had/seen manual action 🙂 Someone I know said that they were looking though their WMTs and under Manual Actions they found they had a partial penalty. There is no date against it and they never got an email and there are no messages WMTs for it. I haven't personally dealt with a Manual penalty before, but I would have expected there to be a message in WMTs for it ( an email might have been missed because of a spam filter etc). Could it be a very old penalty?
Technical SEO | | PaddyDisplays0 -
Can a CMS affect SEO?
As the title really, I run www.specialistpaintsonline.co.uk and 6 months ago when I first got it it had bad links which google had put a penalty against it so losts it value. However the penalty was lift in Sept, the site corresponds to all guidelines and seo work has been done and constantly monitored. the issue I have is sales and visits have not gone up, we are failing fast and running on 2 or 3 sales a month isn't enough to cover any sort of cost let alone wages. hence my question can the cms have anything to do with it? Im at a loss and go grey any help or advice would be great. thanks in advance.
Technical SEO | | TeamacPaints0 -
Duplicate Page Content error but I can't see it
Hi All We're getting a lot of Duplicate Page Content errors but I can't match it up. For example this page: http://www.daytripfinder.co.uk/attractions/32-antique-cottage It is saying the on page properties as follows: Title DayTripFinder - Things to do reviewed by you - 7,000 attractions <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">Meta Description</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">Read Reviews, Browse Opening Hours and Prices. View Photos, Maps. 7,000 UK Visitor Attractions.</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">But this isn't the page title or meta description.
Technical SEO | | KateWaite85
</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">And it's showing five (many others) example pages that share it. Again the page titles and description are different.</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">http://www.daytripfinder.co.uk/attractions/mckinlay-theatre</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">http://www.daytripfinder.co.uk/attractions/bakers-dolphin</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">http://www.daytripfinder.co.uk/attractions/shipley-park-fishing</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">http://www.daytripfinder.co.uk/attractions/king-johns-lodge-and-gardens</dt> <dt style="color: #5e5e5e; font-family: Helvetica, Arial, sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;">http://www.daytripfinder.co.uk/attractions/city-hall
</dt> Any ideas? Not sure if I'm missing something here! Thanks!0 -
If you only want your home page to rank, can you use rel="canonical" on all your other pages?
If you have a lot of pages with 1 or 2 inbound links, what would be the effect of using rel="canonical" to point all those pages to the home page? Would it boost the rankings of the home page? As I understand it, your long-tail keyword traffic would start landing on the home page instead of finding what they were looking for. That would be bad, but might be worth it.
Technical SEO | | watchcases0 -
How can I get a listing of just the URLs that are indexed in Google
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
Technical SEO | | EvergladesDirect0 -
Google can read japanese or only alphabet ?
Hi Actually im running a web shop in several languages: english, french, spanish, italian, russian, german, japanese, korean and japanese ! lol Im trying to optimize my web site for SEO so i changed URL rewriting rules for example French example: From: http://www.test.com/je-suis-francais.html -> http://www.test.com/Je-suis-français.html Japanese example: (i use UTF8 encoding) From: http://www.test.com/watashiwa-nihonnjin-desu.html -> http://www.test.com/私は日本人です.html So i get something like wikipedia (url with accents, ideogramms in several languages) Do you think wikipedia and me are doing wrong ?
Technical SEO | | nipponx0