Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regular Expression Question
We are having a little trouble coming up with a goal that shows how many product pageviews we are getting but I need to exclude search results pageviews that (unfortunately) have the same URL structure. Because it's an outside CMS, we have not ability to change the URL architecture. Products are on these types of pages: https://porscheasheville.com/inventory/Porsche+Boxster+Asheville+North+Carolina+2016+Rhodium+Silver+Metallic+536911 https://porscheasheville.com/inventory/Audi+A4+2.0T+Premium+Plus+Asheville+North+Carolina+2015+Gray+638379 Search results pages have this URL structure: https://porscheasheville.com/inventory/new/ https://porscheasheville.com/inventory/?condition=new&make=Porsche&model=Boxster https://porscheasheville.com/inventory/used/ https://porscheasheville.com/inventory/?condition=used&model=A4+2.0T+Premium+Plus I am hoping to create a GA goal with regular expressions showing only the product pages and not allowing the search results pages show up. Here's what I have, it's not working - any regex experts out there who can help? /inventory/[new/][used/] Thanks as always MOZ friends!
Reporting & Analytics | | ReunionMarketing0 -
Analiytics question
Hello guys, We are currently getting in our website around 600 daily visitors.
Reporting & Analytics | | WayneRooney
We getting around the 50 quotes a day.
Can i know trough Analytics from where all the visitors that click the quote button coming from ? Thank you0 -
Question about setting up Google Webmaster on Network Solutions?
I'm trying to set myself up as a Webmaster on my company's site. We use Network Solutions. I am following Google's directions on how to do this. However, I am a bit wary. Here are the directions. Underlined is the part I am having trouble with: Log in to your account for howlatthemoon.com at www.networksolutions.com by clicking theManage Account icon. In the left navigation bar, open the nsWebAddress (Domains) menu by clicking the **+ **icon. Click Manage Domain Names. On the Domain Details page for the domain you're using, select the Designated DNS radio button (to the right of Change domain to point to) and click the Apply Changes button. If you've previously modified your advanced DNS settings, click Edit (to the right ofDomain currently points to). Under the Advanced DNS Manager heading, click Manage Advanced DNS Records. Under the Text (TXT Records) heading, click Add/Edit. In the Host field, enter @. Leave the TTL field set to the default value. In the Text field, copy and paste the following unique security token:
Reporting & Analytics | | howlusa
(security token removed for obvious reasons) Click Continue. Review your changes and click Save Changes. When you've done saving the TXT record, click the Verify button below on this page. There is already a host of @ (None). The text for it reads: v=spf1 include:_spf.google.com ~all I called Network Solutions and the guy I was speaking with told me to delete it and replace it with my Google Webmasters code. However, I think this is setting up our email. Do I just add the Webmasters in and have two hosts of @ (None)? Thanks!0 -
Panda 3.8 and Google query report
Today I was plugging the SEOMoz Google change log into my analytics and noticed that our site took an impression jump (30%) on the day Panda 3.8 was released but also, if I'm reading the chart right, a drop in keyword avg. ranking. Does this make any sense or am I misunderstanding the chart? googleseomozq.jpg
Reporting & Analytics | | IanTheScot0 -
Question on regular expression for filters on GA
Hi guys, I am creating profiles on some of the countries sites in my network, and have managed to establish the filter for tracking certain url patterns, for example: ^/japan-english- is tracking all my urls in the Japan site that start by japan-english great! however, it is not tracking the japanese instance of the urls. The pattern for the latter is : www.mysite.org/jp/japan-english I could then modify the filter to track the jp subfolder like this: ^/jp/japan-english- but it will then only track the urls on the /jp/ subfolder does anyone know the regex command for tracking the two url patters as follows: /jp/japan-english- & /japan-english- thanks in advance david
Reporting & Analytics | | BritishCouncil0 -
Any thoughts on why Nextag and MonsterMarketPlace are linking to our site?
I'm looking in WMT at the crawl errors and I noticed that our website has gotten a lot of Not Found crawl errors that seem strange. A lot of these not found pages are Display URLs that I use in PPC advertising, but not actual redirects (i.e. explorica.com/EducationalTrips). When I looked at how these links were being found, the inbound links were coming from Nextag.com and monstermarketplace.com, two sites that our company has never had a relationship with. We're an educational travel company, so we'd have no reason to. When I followed the links, it looks like it's coming from their "Sponsored Links," but these aren't Google or Bing Ads. We don't even advertise on the content network. Example link: http://www.monstermarketplace.com/starters-and-alternators/alternator-motorola-style-12v-51a-10376 (the ads do rotate so my site might not appear when you check it out). Anyone ever had experience with this type of issue?
Reporting & Analytics | | Explorica0 -
Yahoo Site explorer: Different results for www & non-www domain. Can we merge these?
When checking our domain on yahoo site explorer, different results are shown for www.theprintspace.de and theprintspace.de. We have done a 301 redirect, as we want to optimise our www.theprintspace.de domain. However, we have a lot more backlinks for theprintspace.de. Is there any way of merging the two, so we don't loose all the linkjuice we get for theprintspace.de and use those links to optimise www.theprintspace.de? Thanks for your help!
Reporting & Analytics | | Waplington0 -
For an optimized site, any available stats / guesstimates on what is avg % of traffic to homepage vs. second-level pages?
I'm interested in passing this info on to a client who experienced a period of time when an incorrect GA code was installed on their homepage. They were able to get Google stats on second level pages only. This is a site that gets 80 + % of visits from organic search engine referrals. They do minimal advertising. Thanks in advance.
Reporting & Analytics | | alankoen1230