Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How preproduction website is getting indexed in Google.
-
Hi team,
Can anybody please help me to find how my preproduction website and urls are getting indexed in Google.
-
As Eric hinted, the best method to prevent any pages being indexed would be to use htaccess password protection dialog on your development site. It's fairly easy to implement. You can find instructions to do so here: http://www.htaccesstools.com/articles/password-protection/
-
Hi Anoop! Have everyone's answers helped? Do you still have any questions?
-
Anoop, when a 'development' or 'preproduction' website or subdomain is getting indexed, that means that you haven't stopped the search engines from crawling it. The search engines, especially Google, are very aggressive at crawling, and they will crawl just about any URL that they find. It seems as though all you have to do is visit that page and it's going to get crawled.
Best way to stop Google from crawling (then indexing) a website is to stop it from getting crawled using the robots.txt file. Keep in mind, though, that even if you tell them to stay out of it using the robots.txt file they will still index those URLs.
The only way to stop Google from crawling would be to password protect the website or make it available only on a private server, or available via VPN only.
-
In addition to noindexing the pages using the meta tag, if you have WMT / Search Console set up, you can request Google remove those URLs from their index for the time being. I've found that this may take up to a couple of hours from the removal request to the time of actual removal.
As to how they were found, there's a good chance that Google crawled a link to a preproduction webpage and went from there.
-
Hi
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the section of your page:
To prevent only Google web crawlers from indexing a page:
You should be aware that some search engine web crawlers might interpret the
noindex
directive differently. As a result, it is possible that your page might still appear in results from other search engines.here is complete guide: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1
-
Hi,
Have you noindexed & nofollowed the site and pages? I would also suggest you block all crawlers by disallowing access in the robots.txt file.
Do you know if this has all been done?
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way to get a list of all pages of your website that are indexed in Google?
I am trying to put together a comprehensive list of all pages that are indexed in Google and have differing opinions on how to do this.
Technical SEO | | SpodekandCo0 -
Google Indexed a version of my site w/ MX record subdomain
We're doing a site audit and found "internal" links to a page in search console that appear to be from a subdomain of our site based on our MX record. We use Google Mail internally. The links ultimately redirect to our correct preferred subdomain "www", but I am concerned as to why this is happening and if it can have any negative SEO implications. Example of one of the links: Links aspmx3.googlemail.com.sullivansolarpower.com/about/solar-power-blog/daniel-sullivan/renewable-energy-and-electric-cars-are-not-political-footballs I did a site operator search, site:aspmx3.googlemail.com.sullivansolarpower.com on google and it returns several results.
Technical SEO | | SS.Digital0 -
Blocked URL parameters can still be crawled and indexed by google?
Hy guys, I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand: IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url? IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand? Thanks, PS: ok 3 questions :)...
Technical SEO | | catalinmoraru0 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0 -
How to display the full structure of website on Google serps
I have been searching around but unable to gather information as to how we can control or list top pages of a website on Google's first page , i.e. if we type seomoz in google , we can see the main listing with 6 subdomain listings , which link to Blog , Seo tool , Beginner Seo guide , Learn Seo , Pricing & Plans and login My question is can we control these listings i.e. what to display and what not , and if yes how can we make this type of visibility on first page , by using html or xml sitemaps or theirs something mostly websites are missing. Cause this type of data is coming up for very less websites and mostly websites are with single urls. c43Ki.jpg
Technical SEO | | ngupta10 -
Unnecessary pages getting indexed in Google for my blog
I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog. I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin. But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file. Have a look at my robots.txt file here: http://dapazze.com/robots.txt Please help me out to solve this problem permanently?
Technical SEO | | rahulchowdhury0 -
Getting a video displaying a lightbox indexed
We have created a video for a category page with the goal of building links to the page and improving the conversion rate of visitors to the page. This category is Christmas oriented so we want to get the video dropped in ASAP. Unfortunately there was a mixup with our developer and he created a lightbox pop-up to display the video on the category page. I'm concerned this will hurt our ability to get the video indexed in Google. Here was his response. Is what he says here true? "With the video originally being in lightbox the iFrame Embed was enough since the video can't be on the page, it would have to be hidden on the page which is ignored by Google. The SEO would be derived from modifying the video sitemap to define the category page as the HTML page for the Wistia video and Google will make the association. The sitemap did all the heavy lifting, the schema markup did not come till later so it had no additional affect on Google other then to re-enforce the sitemap." Thanks for your help!
Technical SEO | | GManSEO0 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | | fugu0