Google Indexing Pages with Made Up URL
-
Hi all,
Google is indexing a URL on my site that doesn't exist, and never existed in the past. The URL is completely made up. Anyone know why this is happening and more importantly how to get rid of it.
Thanks
-
Hi Brian
Dan (Moz Associate) here. Bernadette and Excal pretty much nailed it. Just wanted to add that OSE, Search Console and other links tools may not always display every single link that exists out there on the web (especially OSE - OSE is the most 'filtered' index, showing mostly quality/relevant links and filtering out the most spam etc).
Regardless, the best course of action is indeed to be sure your broken pages return a proper 404 status code, and Google will handle the rest
-
Agree with Bernadette that this is most likely a hacker / spammer taking advantage of a configuration issue with your website. If you're using a CMS (Wordpress/Joomla/Drupal etc.) make sure that it has been properly configured (or have your website developer do it).
I had a similar instance with a website I inherited a few years back where there was a configuration issue on the CMS that enabled individuals to set themselves up as users and a blogging extension, which had an out of the box configuration issue enabling anyone to create blog posts. Whilst the blogging tool was set to require admin approval to make the article live and visible on the site, once the article was created, it was still somehow able to be indexed by Google which created one hell of a mess.
Fixing the issue in the CMS/Blogging extension was quite simple but the cleanup took a long while and over a period of months I had to disavow a continuing stream of junk links and spent a lot of time writing to other webmasters advising them of the issue with their site so they could remove. Nearly 3 years down the line I still get a few of these pop up from time to time, as there are obviously other sites that have not plugged the gap and updated their blogging tool and as such contain this massive list of dodgy links from link spammers.
If you are using a CMS I would recommend that you, or your webmaster, check the list of authorised users and, if there are any that you do not recognise or you did not create then block them; and immediately take a look at your CMS security settings to ensure that all new users require Admins to approve/activate them before they can do anything.
Unfortunately with this stuff, once the exploits are discovered it is quickly disseminated across the internet and every link spammer (and his dog) tend to jump on-board, so the quicker you can plug the leak and commence remediation the better. Good luck
-
Brian, that's definitely an issue. If it's not delivering a 404 error when you go to a non-existent page on your site, that's the problem. I could theoretically go to yourdomain.com/aslksjdltkjlkjalskdj.html, make a link to it, and Google would index the page.
Check with your web developer to see how you can make sure that 404 error pages (page not found) delivers a 404 error in the server header.
There are lots of ways that Google will discover new URLs (even someone browsing with Google Chrome might allow Google to discover a new URL and then crawl it). So, you'll want to make sure that you have this fixed on your site.
-
Hi Bernadette,
Thanks for your response. I checked OSE and Search Console and can't find any links pointing to the URL. I did the server header check and it's delivering a 200 OK response.
-
Brian, when this happens, there is typically one reason: somewhere there is a link with that URL in it. What we've seen before is that oftentimes those links are created by hackers or spammers that then try to create content on your site with that URL. For example, when a site is hacked, they will create a page on your site and then link to it.
Without the URL (or the page name without your domain name), it's tough for me to see what might be causing this. But, there has to be a link somewhere to it in order for Google to want to index it.
What I would do is use a server header check tool (such as http://www.rexswain.com/httpview.html) to see if the page has a "200 OK" server response or a 404 error. Google typically doesn't index pages that deliver 404 errors. It could be that the server is set up to deliver a "page not found" on your site but it comes up with a "200 OK" in the server header, so Google indexes the page.
Check your site to see if there is a link to the page. If the link exists, then fix it. Then, look at Majestic.com or Open Site Explorer to see if they show any links from other sites to the page. If those links exist, see if you can get rid of those links.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
3,511 Pages Indexed and 3,331 Pages Blocked by Robots
Morning, So I checked our site's index status on WMT, and I'm being told that Google is indexing 3,511 pages and the robots are blocking 3,331. This seems slightly odd as we're only disallowing 24 pages on the robots.txt file. In light of this, I have the following queries: Do these figures mean that Google is indexing 3,511 pages and blocking 3,331 other pages? Or does it mean that it's blocking 3,331 pages of the 3,511 indexed? As there are only 24 URLs being disallowed on robots.text, why are 3,331 pages being blocked? Will these be variations of the URLs we've submitted? Currently, we don't have a sitemap. I know, I know, it's pretty unforgivable but the old one didn't really work and the developers are working on the new one. Once submitted, will this help? I think I know the answer to this, but is there any way to ascertain which pages are being blocked? Thanks in advance! Lewis
Technical SEO | | PeaSoupDigital0 -
How do I get my pages to go from "Submitted" to "Indexed" in Google Webmaster Tools?
Background: I recently launched a new site and it's performing much better than the old site in terms of bounce rate, page view, pages per session, session duration, and conversions. As suspected, sessions, users, and % new sessions are all down. Which I'm okay with because the the old site had a lot of low quality traffic going to it. The traffic we have now is much more engaged and targeted. Lastly, the site was built using Squarespace and was launched the middle of August. **Question: **When reviewing Google Webmaster Tools' Sitemaps section, I noticed it says 57 web pages Submitted, but only 5 Indexed! The sitemap that's submitted seems to be all there. I'm not sure if this is a Squarespace thing or what. Anyone have any ideas? Thanks!!
Technical SEO | | Nate_D0 -
Does Google Parse The Anchor Text while Indexing
Hey moz fanz, I'm here to ask a bit technical and open-minding question.
Technical SEO | | atakala
In the Google's paper http://infolab.stanford.edu/~backrub/google.html
They say they parse the page into hits which is basically word occurences.
But I want to know that they also do the same thing while keeping the anchor text database.
I mean do they parse the anchor text or keep it as it is .
For example, let's say my anchor text is "real car games".
When they indexing my link with anchor text, do they parse my anchor text as hits like
"real" distinct hits
"car" distinct hits
"games" distinct hits.
OR do they just use it as it is. As "real car games"0 -
Does Google index has expiration?
Hi, I have this in mind and I think you can help me. Suppose that I have a pagin something like this: www.mysite.com/politics where I have a list of the current month news. Great, everytime the bot check this url, index the links that are there. What happens next month, all that link are not visible anymore by the user unless he search in a search box or google. Does google keep those links? The current month google check that those links are there, but next month are not, but they are alive. So, my question is, Does google keep this links for ever if they are alive but nowhere in the site (the bot not find them anymore but they work)? Thanks
Technical SEO | | informatica8100 -
Google picking up wrong page title
Hi, When searching for "Tottenham Forum" on google.co.uk (link below) http://www.google.co.uk/search?q=tottenham+forum&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a The site I manage (THFCTalk.com) is listed as 4th in the search results, but was hacked a few months ago and the search results lists the page title as "Free Shipping. Order Cialis Online. - Online Pharmacy" when the actual page title of THFCTalk is not actually set at that. Any idea how to fix this so Google updates this header on the search results? - as it is surely putting people off from clicking on our search result
Technical SEO | | WalesDragon0 -
Pages not Indexed after a successful Google Fetch
I am trying to understand why google isn't indexing key content on my site. www.BeyondTransition.com is indexed and new pages show up in a couple of hours. My key content is 6 pages of information for each of 3000 events (driven by mySQL on a wordpress platform). These pages are reached via a search page, but no direct navigation from the home page. When I link to an event page from an indexed page it doesn't show up in search results. When I use fetch on webmaster tools the fetch is successful but is then not indexed - or if it does appear in results it's directed to the internal search page e.g. http://www.beyondtransition.com/site/races/course/race110003/ has been fetched and submitted with links but when I search for BeyondTransition Ironman Cozumel I get these results.... So what have I done wrong and how do I go about fixing it? All thoughts and advice appreciated Thanks Denis
Technical SEO | | beyondtransition0 -
Why does our page show a description in english in google spain?
Hi! We have a multilingual page and I have set in Google Webmaster Tools the language preference for the root domain to be none, Spanish for the .com/es, English for the .com/en, and German for the .com/de. The title and description show in the right language in Google Germany and google UK, but in google.es (Spain) the title and description appear in English instead of Spanish. Does anybody know why could this be happening and how to fix it? kJtF3.png
Technical SEO | | inmonova0 -
We changed the URL structure 10 weeks ago and Google hasn't indexed it yet...
We recently modified the whole URL structure on our website, which resulted in huge amount of 404 pages changing them to nice human readable urls. We did this in the middle of March - about 10 weeks ago... We used to have around 5000 404 pages in the beginning, but this number is decreasing slowly. (We have around 3000 now). On some parts of the website we have also set up a 301 redirect from the old URLs to the new ones, to avoid showing a 404 page thus making the “indexing transmission”, but it doesn’t seem to have made any difference. We've lost a significant amount of traffic, because of the URL changes, as Google removed the old URLs, but hasn’t indexed our new URLs yet. Is there anything else we can do to get our website indexed with the new URL structure quicker? It might also be useful to know that we are a page rank 4 and have over 30,000 unique users a month so I am sure Google often comes to the site quite often and pages we have made since then that only have the new url structure are indexed within hours sometimes they appear in search the next day!
Technical SEO | | jack860