Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query on Not Set In Product List Performance in Google Analytics
Hi All, I have query for given below screenshot-1. What is Not Set here? For column no. 2 only purchase and revenue showing other column blank why? I have properly implemented enhance ecommerce via tag manager. And my product list impression, clicks all working fine for all categories now I don't know from where I am getting Not set - Please check screenshot-2. So what is Not set here? Thanks! QcBGT OCrEp
Reporting & Analytics | | Arnold30 -
GA Internal Site Search Correct Query Parameter?
Hi Guys, Recently added GA internal site search to a sub-folder: http://tinyurl.com/jhm9cyl Just want to confirm would the query parameter be: /search/ Or different because it's a sub-folder? Cheers.
Reporting & Analytics | | jayoliverwright0 -
How can I redirect incoming links from an old version of my site ending in .ctlg and .ivnu?
My original site was published in 2001 using "version 2" software from Ivenue, the hosting company that I signed up with at that time. The site's structure was built in such a way that the primary category pages ended in the extension .ivnu. Product or item pages on the shopping cart side ended in the extension .ctlg. My site's name was and is [Lamplight Feather, Inc.](<a class="webkit-html-attribute-value webkit-html-external-link" href="http://www.tonyhill.net/" target="_blank">http://www.tonyhill.net/</a>). We built our business between 2001 and 2011 and by the last three years (2009 - 2011) of using their version two were averaging a million dollars per year in gross sales. We decided to "upgrade" to Ivenue's "version 3" in 2011 to take advantage of some more modern options and because their newer software created web pages ending in .html which we thought more desirable. We made the switch in late 2011. But it was a disaster. Traffic and sales dropped precipitously. For the past two years (2012-2013) our annual gross sales average dropped to $400,000. (Two other factors were involved beside losing the many incoming links and link juice we had built up over the years: Panda came in that fall and my little niche market (decorative feathers) was flooded with competitors.) However as I try to rebuild our traffic and business little by little, I am stumped as to how to redirect the many incoming links that went to our first site's .ivnu and .ctlg pages. I have constructed redirects for some of our current but changed .html pages like this and put them in the file cabinet and they work: For (example): http://www.tonyhill.net/feathers_c384589.html then But trying the same thing for (example) http://www.tonyhill.net/craftfeathers.ivnu still returns a 404. Is there something I am missing. Ivenue is useless in this matter by the way. Their "technicians" are no help. I plan to be migrating my site once again to a new hosting company and hope to solve this problem before then. Thanks for the attention, Tony Hill This is an example from Google Webmaster of the type of links that show up as 404's that I would like to redirect: | URL: | http://www.tonyhill.net/productCat96521.ctlg | | | Error details | Linked from | | <colgroup><col></colgroup>
Reporting & Analytics | | featherman
| http://www.tonyhill.net/productCat43986.ctlg |
| http://forum.muppetcentral.com/showthread.php?t=21416&page=2 |
| http://www.cosplay.com/showthread.php?p=3832751 |
| http://forum.muppetcentral.com/showthread.php?t=21416&page=2&highlight=fur |
| http://www.muppetcentral.com/forum/threads/puppeteers-resources-links.19330/page-2 |
| http://www.muppetcentral.com/forum/threads/how-do-you-like-my-puppets.18549/page-2 | | | | |0 -
Google Analytics Site Search to new sub-domain
Hi Mozzers, I'm setting up Google's Site Search on a website. However this isn't for search terms, this will be for people filling in a form and using the POST action to land on a results page. This is similar to what is outlined at http://support.google.com/analytics/bin/answer.py?hl=en&answer=1012264 ('<a class="zippy zippy-collapse">Setting Up Site Search for POST-Based Search Engines').</a> However my approach is different as my results appear on a sub-domain of the top level domain. Eg.. user is on www.domain.com/page.php user fills in form submits user gets taken to results.domain.com/results.php The issue is with the suggested code provided by Google as copied below.. Firstly, I don't use query strings on my results page so I would have to create an artificial page which shouldn't be a problem. But what I don't know is how the tracking will work across a sub-domain without the _gaq.push(['_setDomainName', '.domain.com']); code. Can this be added in? Can I also add Custom Variables? Does anyone have experience of using Site Search across a sub-domain perhaps to track quote form values? Many thanks!
Reporting & Analytics | | panini0 -
When one of my sites returns a ranking that consistently reads "No Data", what does that say about the site?
I am getting "No Data" reads for some of my sites - I personally think it has to do with the site's construction - especially the landing page... I inherited this site to do SEO - it was not created with on site SEO in mind - please help if you can sites are: www.storagesanangelo.com www.storagemidland.com Should I get webmaster to remove the big map graphic and add text and pics instead... Sure appreciate brilliant thoughts - even about yetis and beer
Reporting & Analytics | | creativeguy0 -
.com version and .org version of site
So i just discovered that a site I now managae has a .com version - as well as the .org version that is the one everyone knows about! I'm guessing this is not a good thing... So the whole site eg www.abc.org/example has a mirror page www.abc.com/example.... What should I do about this? Is it really bad to have 2 versions out there? Thanks!
Reporting & Analytics | | inhouseninja0 -
Segmenting traffic from referring sites in GA
Most of our traffic is from Referring sites, and in referring sites, job sites are sending most of the traffic. How can we segment traffic from job sites. There are about 40 such sites. We would like to receive a report which shows traffic excluding from these job sites.
Reporting & Analytics | | seoug_20050 -
Setting up Google Analytic Goals to a 3rd Party Site
I recently received help on a question I asked on SEOmoz but need additional clarification. I am trying to set up goals in Google Analytics for people who click on a “purchase botton” which sends them to PayPal. I created a Thank You page and tried to get PayPal to redirect to it, however, our customers only get to our site’s 404 page. Here is what I’ve done so far: Went into my PayPal account and turned the “Auto Return” to ‘on’ Under website payment preferences, I added the following URL http://www.teecycle.org/thank-youutm_nooverride1. (I formatted the URL this way because the person who provided me with help recommended using the format ?UTM_nooverride=1. However, our CMS system won’t allow “?” or “=”)
Reporting & Analytics | | EricVallee340