Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using the Google Remove URL Tool to remove https pages
-
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week.
I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front.
For example, I add to the removal tool:-
https://www.mydomain.com/blah.html?search_garbage_url_addition
On the confirmation page, the URL actually shows as:-
http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition
I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look?
AND PART 2 OF MY QUESTION
If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request?
www.domain.com/url.html?xsearch_...
A description for this result is not available because of this site's robots.txt – learn more.
-
Thanks so much for taking the time to respond.
I think I will add the https to WMT and remove them that way.
I will take a look through the .htaccess file and the creation of the ssl robots file. A while back, it seemed that Google was indexing a lot of my site as https and then the dropped it and went mainly back to http. I will get that sorted to make it clear.
-
Hi there
I'll start with question 2 first as it's a bit easier to answer. Robots.txt blocks the crawling of a page, but not necessarily indexing. Of course, if the page cannot be crawled it will be deindexed eventually anyway, but if you're getting that description for one of your URLs, Google has not been able to access it and will stop trying to. So that is usually enough, although if you want to remove it as well, you can by all means.
For question 1 - GWT is a bit awkward in the sense that it treats http and https versions of your site as different webmaster properties. Furthermore, if you want to remove a URL on your site, it will always prefix it with the http/https version of your site, no matter how you enter it.
If you added another WMT property that was https://www.yourdomain.com - you would be able to manage that domain as well and thus you would be able to remove any URLs under that prefix.
Incidentally, if you want to block all HTTPS pages from being accessed, you can do that with a special instruction in your htaccess file and robots txt. You can instruct the Googlebot and other bots to read a specific robots.txt file if they visit an HTTPS URL. To do that, you would first add this to your htaccess file:
RewriteCond %{HTTPS} ^on$
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule ^(.*)$ /robots_ssl.txt [L]This command basically says "if the URL has https, read the robots_ssl.txt file". You then upload a file called robots_ssl.txt to your root domain. In the txt file you just add:
User-agent: *
Disallow: /So now, if a bot reaches an https URL, it has to read the robots_ssl.txt file and upon reading that, they are denied access. That would prevent all of your https URLs from being indexed.
That might be useful to you, but if you go ahead and use it please take care to backup all your files in case anything goes wrong - your htaccess file is very important!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Audit Tools Not Picking Up Content Nor Does Google Cache
Hi Guys, Got a site I am working with on the Wix platform. However site audit tools such as Screaming Frog, Ryte and even Moz's onpage crawler show the pages having no content, despite them having 200 words+. Fetching the site as Google clearly shows the rendered page with content, however when I look at the Google cached pages, they also show just blank pages. I have had issues with nofollow, noindex on here, but it shows the meta tags correct, just 0 content. What would you look to diagnose? I am guessing some rogue JS but why wasn't this picked up on the "fetch as Google".
Technical SEO | | nezona0 -
If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks!
Technical SEO | | jgresalfi0 -
Create Page Titles from H1 using Yoast?
I'm working on a site that has 280 blog posts that have either been migrated from an old CMS site or created on the Dev version of the new WordPress site. We've written 280 unique meta descriptions so they don't truncate but it there a quick way I can export the current H1s and then import them into Yoast so they are set as the Page Titles? I've written unique Page Titles and meta descriptions for all the Service and Products page and just want a way to speed up the blog posts as their H1s are really good and what I would use as Page Titles anyway. Any help, greatly appreciated!
Technical SEO | | Marketing_Today0 -
Removing Redirected URLs from XML Sitemap
If I'm updating a URL and 301 redirecting the old URL to the new URL, Google recommends I remove the old URL from our XML sitemap and add the new URL. That makes sense. However, can anyone speak to how Google transfers the ranking value (link value) from the old URL to the new URL? My suspicion is this happens outside the sitemap. If Google already has the old URL indexed, the next time it crawls that URL, Googlebot discovers the 301 redirect and that starts the process of URL value transfer. I guess my question revolves around whether removing the old URL (or the timing of the removal) from the sitemap can impact Googlebot's transfer of the old URL value to the new URL.
Technical SEO | | RyanOD0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0 -
Redirect non-www if using canonical url?
I have setup my website to use canonical urls on each page to point to the page i wish Google to refer to. At the moment, my non-www domain name is not redirected to www domain. Is this required if i have setup the canonical urls? This is the tag i have on my index.php page rel="canonical" href="http://www.mydomain.com.au" /> If i browse to http://mydomain.com.au should the link juice pass to http://www.armourbackups.com.au? Will this solve duplicate content problems? Thanks
Technical SEO | | blakadz0 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | | upick-1623910 -
How to handle sitemap with pages using query strings?
Hi, I'm working to optimize a site that currently has about 5K pages listed in the sitemap. There are not in face this many pages. Part of the problem is that one of the pages is a tool where each sort and filter button produces a query string URL. It seems to me inefficient to have so many items listed that are all really the same page. Not to mention wanting to avoid any duplicate content or low quality issues. How have you found it best to handle this? Should I just noindex each of the links? Canonical links? Should I manually remove the pages from the sitemap? Should I continue as is? Thanks a ton for any input you have!
Technical SEO | | 5225Marketing0