Removing indexed pages

Jettynz

Hi all, this is my first post so be kind - I have a one page Wordpress site that has the Yoast plugin installed. Unfortunately, when I first submitted the site's XML sitemap to the Google Search Console, I didn't check the Yoast settings and it submitted some example files from a theme demo I was using. These got indexed, which is a pain, so now I am trying to remove them. Originally I did a bunch of 301's but that didn't remove them from (at least not after about a month) - so now I have set up 410's - These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?
Thanks in advance for any suggestions.

Jettynz

Thanks for all the responses!

At the moment I am serving the 410's using the .htaaccess file as I removed the actual pages a while ago. The pages don't show in most searches, however, two of them do show up in some instances under the sitelinks which is the main pain. I manually asked for them to be removed using 'remove urls' however that only last a couple of months and they are now back.

So I guess the best way is to recreate the pages and insert a noindex?

Thanks again for everyone time, it's much appreciated.

Joe.Robison

I agree with ViviCa1's methods, so go with that.

One thing I just wanted to bring up though, is that unless people are actually visiting those pages you don't want indexed, or it does some type of brand damage, then you don't really need to make it a priority.

Just because they're indexed doesn't mean they're showing up for any searches - and most likely they aren't - so people will realistically never see them. And if you only have a one-page site, you're not wasting much crawl budget on those.

I just bring this up since sometimes we (I'm guilty of it too) can get bogged down by small distractions in SEO that don't really help much, when we should be creating and producing new things!

"These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?"

There was a good related response from Google employee Susan Moskwa:

“The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”

A bit older, but shows how Google discovers URLs through the sitemap. Take a look at the rest of that thread as well.

ViviCa1

I'd suggest adding a noindex robots meta tag to the affected pages (see how to do this here: https://support.google.com/webmasters/answer/93710?hl=en) and until Google recrawls use the remove URLs tool (see how to use this here: https://support.google.com/webmasters/answer/1663419?hl=en).

If you use the noindex robots meta tag, don't disallow the pages through your robots.txt or Google won't even see the tag. Disallowing Google from crawling a page doesn't mean it won't be indexed (or removed from the index), it just means Google won't crawl the page.

seoman10

Couple of ideas spring to mind

Use the robots.txt file
Demote the site link in Google search console (see https://support.google.com/webmasters/answer/47334)

Example of robots.txt file...

Disallow: /the-link/you-dont/want-to-show.html
Disallow: /the-link/you-dont/want-to-show2.html

Don't include the domain just the link to the page, Plenty of tutorials out there worthwhile having a look at http://www.robotstxt.org

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Removing indexed pages

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Large Drop in Indexed Pages But Increase in Traffic

Are image pages considered 'thin' content pages?

Removing a staging area/dev area thats been indexed via GWT (since wasnt hidden) from the index

Skip indexing the search pages

Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?

Will rel=canonical cause a page to be indexed?

SEOMoz Crawl Diagnostic indicates duplicate page content for home page?

If a page isn't linked to or directly sumitted to a search engine can it get indexed?