Google indexing thousands crazy search results with %25253
-
In GWT I started seeing very strange pages indexed a few weeks, and Google is no reporting over 21,000 of pages (blocked by robots.txt) with weird URLs like this:
The current robots.txt looks like this:
User-agent: *
Disallow: /wp-contentDisallow: /wp-admin
Disallow: /wp-includes
Disallow: /data
Disallow: /slideshows
Disallow: /page/*/?s=
Disallow: /?s=
Disallow: /searchThis website is running an up to date WP install with Yoast's Google Analytics and SEO plug-in. I can't point to anything specific that happened with the site when these URLs started appearing even after I modified the robots.txt.
What can be done to try and stop Google from creating and indexing these goofy URLs?
I see lots of sites having this issue when I search in Google, but no one seems to have a solution.
-
As it turns out the problem is with Yoast's Google Analytics plug-in per Yoast. However, he has not yet released a fix nor given a date for the fix as of yet. So one either needs to deal with it until fixed or switch plug-ins.
-
Hi Sha,
Well, that is a new possible lead, but unfortunately Pictage is basically worthless when it comes to any technological issues.
Hmm, is there some way I could add "noindex" tags to anything link that appears on the Proof page as they are dynamic in appearance?
Thanks,
Joe
-
Hi again Joe,
After a more detailed look at your site (which has no obvious search box available to users) I was curious as to why all of the things that you are doing on the site seem to have no effect upon the issues you are trying to resolve...and why your site is generating thousands of search queries without a search box!
This says to me "do you have control of all of the content?" ... and it appears that you are using an external service called Pictage to upload and display client portfolios.
So, are you pulling content into your site from Pictage? Is it some kind of white label add-on to your site?
If the pages from Pictage are being generated externally, then the yoast plugin cannot add the "noindex" tag to those pages...if this is the case then I would say you need to contact the Pictage help people and advise them that there is a problem they need to attend to.
Hope that helps,
Sha
-
Hi Egol,
Hmm, I have never heard of that possibility.
How can I change the resultant search URL with a Wordpress install?
Thanks.
-
Hi Sha,
I made the changes weeks ago, but more pages keep appearing which tells me Google is still trying to index them?
There is already an "s" parameter set in GWT, but I don't really see many options in this screen - are there some settings I'm missing?
There are also page URLs like this one, can they be blocked as well?
-
In addition to the suggestions already given... if this was my site I would change the URL of the search results page. Someone might have a robot that is tossing crap queries into your search box.
-
Hi Joe,
A couple of things:
- If you have made the change to noindex search results recently, it may take some time for the errors to disappear from GWT. If the number of pages continues to grow, then clearly the noindex is not implemented as you expect.
- You could try using the parameter handling feature in GWT to tell googlebot to ignore all pages with the parameter in question. In your search string, the ? says "here come some parameters" and the "s" is the parameter that you want to ignore.
Incidentally, there is definitely something funky happening with the generation of those search strings which should be investigated and resolved as well.
Hope that helps,
Sha
-
Yoast's WordPress SEO plug-in automatically does the following:
- RSS feeds are now always noindex, followed. No search engine should ever list an RSS feed as a result in the resultpages.
- Admin, login and registration pages are always noindexed now for the same reason.
- Search result pages are now always noindex, follow.
-
This is in your own website's search, right?
I've always heard that you should do on page robots that make it:
no-index, follow
So that all of the links on the page can be followed, but Google will not index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Indexed, but not shown in search result
Hi all We face this problem for www.residentiebosrand.be, which is well programmed, added to Google Search Console and indexed. Web pages are shown in Google for site:www.residentiebosrand.be. Website has been online for 7 weeks, but still no search results. Could you guys look at the update below? Thanks!
Technical SEO | | conversal0 -
Spammy structured data for http://www.heritageprinting.com/ might be dropped from search results
We received the above message, which I'm see may also have. Before I go making hours of edits can someone give me an opinion on what may need fixed? Here's a link to one of our products: http://heritageprinting.com/products/step-and-repeat.phpAll products are uniquely marked upIt may be the $ dollar sign, but I'm not certain.Looking at WMT > Search Appearance > Structured Data, I see no errors for Schema Markup. TY in advance :)KJr
Technical SEO | | KevnJr0 -
404 or rel="canonical" for empty search results?
We have search on our site, using the URL, so we might have: example.com/location-1/service-1, or example.com/location-2/service-2. Since we're a directory we want these pages to rank. Sometimes, there are no search results for a particular location/service combo, and when that happens we show an advanced search form that lets the user choose another location, or expand the search area, or otherwise help themselves. However, that search form still appears at the URL example.com/location/service - so there are several location/service combos on our website that show that particular form, leading to duplicate content issues. We may have search results to display on these pages in the future, so we want to keep them around, and would like Google to look at them and even index them if that happens, so what's the best option here? Should we rel="canonical" the page to the example.com/search (where the search form usually resides)? Should we serve the search form page with an HTTP 404 header? Something else? I look forward to the discussion.
Technical SEO | | 4RS_John1 -
Google Enterprise Search Questions
Hi Everybody, A client has asked me to take a look at Google Enterprise Search for them. It has been a few years since I last fooled around with implementing a Google search box on a website, and that was the free version which included off-site results in the results. This appears to be the main page describing the paid product: http://www.google.com/enterprise/search/ I have three questions: The search testing function on the above page doesn't seem to be working. I'm typing in a URL and search term, as prompted, and the page is simply refreshing. It never provides me an example set of results. Is it working for you? This client has a moderately large e-commerce site (about 200 products). Have you implemented Google enterprise search on such a site and are you happy with its performance? The goal here is to let users search for a topic and be returned both product and informational pages. How well does this tool do this? Am I going to need to know any special types of coding (beyond html/css) to implement this? If so, what are they? If you have experience with this product, I would surely appreciate your feedback. Thank you!
Technical SEO | | MiriamEllis0 -
If Google's index contains multiple URLs for my homepage, does that mean the canonical tag is not working?
I have a site which is using canonical tags on all pages, however not all duplicate versions of the homepage are 301'd due to a limitation in the hosting platform. So some site visitors get www.example.com/default.aspx while others just get www.example.com. I can see the correct canonical tag on the source code of both versions of this homepage, but when I search Google for the specific URL "www.example.com/default.aspx" I see that they've indexed that specific URL as well as the "clean" one. Is this a concern... shouldn't Google only show me the clean URL?
Technical SEO | | JMagary0 -
Looking for someone to help get reconsidered in Google search.
Hi Guys, I need some extra hands and recently got some notifications from Google about unnatural links and this is something I've known about already. I've tried a few times to email webmasters but there is little to no results. I'm looking for some individuals that are great at getting reconsidered to Google. The site I am working on already is being ranked but this is a partial ban. Please PM me your rates and success with previous campaigns. Ideally, I want a flat rate for 100% guarantee reconsideration. TIA!
Technical SEO | | William.Lau0 -
My organic search results are down 16% since the Penguin update 4/24
Penguin has affected my search results down 16% When I look at my SEOmoz scan the only problem I see is "too many on page links" The problem is that my blog for each month is considered one page-eg august 2007 I wrote many blogs-the total on page links was 106-but that included all the blogs that were written in a month. The other problem area is duplicate content. I thought Penguin was after "link farming" which I do not do. Any advice how I can correct this? Brooke
Technical SEO | | wianno1680 -
Higher PA score not reflected in google results - Redirect Issue ?
We have a redirect on our site at www.subsidesports.com to www.subsidesports.com/uk. Checking both home page scores in OSE, the .com/uk site has a higher PA and other metrics than .com yet all Home Page SERPS listed in Google still show .com with the lower PA and other metrics although the DA score of course is the same for both. Are we doing anything wrong here ? As part of my troubleshooting performed a redirect check using <http://www.ragepank.com/redirect-check/> and received the following error report: http://www.subsidesports.com/index.html returns a 200 (OK) response. PR N/A http://subsidesports.com/index.html returns a 200 (OK) response. PR N/A Potential problems on this site 2 pages returned a 200 response. This indicates potential for duplicate content problems. Ideally, only http://www.subsidesports.com OR http://subsidesports.com should return a 200 response. Are these two issues related and perhaps answered my own question ?
Technical SEO | | gooner10