Attack of the dummy urls -- what to do?
-
It occurs to me that a malicious program could set up thousands of links to dummy pages on a website:
www.mysite.com/dynamicpage/dummy123
www.mysite.com/dynamicpage/dummy456
etc..
How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors.
I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
-
Hello,
I am also having this issue with hundreds of dummy urls that never existed as a part of our website's blog. Do I go into parameters and specify each of the dummy urls to avoid this?
Thanks in advance for any help!!!! (and sorry to piggyback this question Theodore-hope you don't mind!)
-
Thanks Ray. Appreciate the advice!
-
It's great that you've identified issues like this. I also suggest that if you know certain parameters are generated often and not necessary to index, that you go into your Google Webmaster Tools account > Crawl > URL Parameters and proactively set the crawl rate to 'No URLs' is appropriate. I do this with certain custom parameters for sites that are prone to having these extra URLs indexed mistakenly.
-
Hi Ray-pp,
Thanks for your answer. I'm not getting anything significant, but occasionally a bot will come with extra stuff added to the parameter names, so it got me to thinking a malicious program or nasty competitor might want to do that to cause havoc. My understanding is that 404s don't hurt SEO ranking from Google, but I was thinking that the way things are set up now no-one would get a 404 and in fact Google would index the 'bad' pages, so maybe I needed to do something proactively to 404 or 301 such pages so they would never get put into an index at all.
Since my site has lots of dynamically generated pages, I've had my share of surprises, and am just trying to avoid any new ones!
-
Hi Theodore - You pose an interesting problem, are you currently experiencing this issue? I don't see why someone would create a bunch of random non-existent links to your site, but if they did (and the pages were receiving low quality traffic) then I would proactively disavow those domains that created the links. That would be enough to prevent any penalties you're afraid of receiving.
If, however, you're noticing that specific 404 pages are receiving quality traffic (maybe an old page was removed but good traffic is still sent to the page) then you would want to 301 that page to its closest relative page that deserves the traffic and authority.
Does that help? Maybe a little more information around you specific problem would allow me to tailor the advice better.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL indexed but not submitted in sitemap, however the URL is in the sitemap
Dear Community, I have the following problem and would be super helpful if you guys would be able to help. Cheers Symptoms : On the search console, Google says that some of our old URLs are indexed but not submitted in sitemap However, those URLs are in the sitemap Also the sitemap as been successfully submitted. No error message Potential explanation : We have an automatic cache clearing process within the company once a day. In the sitemap, we use this as last modification date. Let's imagine url www.example.com/hello was modified last time in 2017. But because the cache is cleared daily, in the sitemap we will have last modified : yesterday, even if the content of the page did not changed since 2017. We have a Z after sitemap time, can it be that the bot does not understands the time format ? We have in the sitemap only http URL. And our HTTPS URLs are not in the sitemap What do you think?
Intermediate & Advanced SEO | | ZozoMe0 -
Do you know if there is a tool that can tell you if a url have backlink?
Hi, Do you know if there is a tool that I can check backlinks for thousands of URLs Thanks Roy
Intermediate & Advanced SEO | | kadut0 -
When the site's entire URL structure changed, should we update the inbound links built pointing to the old URLs?
We're changing our website's URL structures, this means all our site URLs will be changed. After this is done, do we need to update the old inbound external links to point to the new URLs? Yes the old URLs will be 301 redirected to the new URLs too. Many thanks!
Intermediate & Advanced SEO | | Jade1 -
Many New Urls at once
Hi, I have about 5,000 new URLs to publish. For SEO/Google - Should I publish them gradually, or all at once is fine? *By the way - all these URLs were already indexed in the past, but then redirected. Cheers,
Intermediate & Advanced SEO | | viatrading10 -
Internal Links - Different URLs
Hey so, In my product page, I have recommended products at the bottom. The issue is that those recommended products have long parameters such as sitename.com/product-xy-z/https%3A%2F%2Fwww.google.co&srcType=dp_recs The reason why it has that long parameter is due to tracking purposes (internally with the dev and UX team). My question is, should I replace it with the clean URL or as long as it has the canonical tag, it should be okay to have such a long parameter? I would think clean URL would help with internal links and what not...but if it already has a canonical tag would it help? Another issue is that the URL is different and not just the parameter. For instance..the canonical URL is sitename.com/productname-xyz/ and so the internal link used on the product page (same exact page just different URL with parameter) sitename.com/xyz/https%3A%2F%2Fwww.google.co&srcType=dp_recs (missing product name), BUT still has the canonical tag!
Intermediate & Advanced SEO | | ggpaul5620 -
How and When Should I use Canonical Url Tags?
Pretty new to the SEO universe. But I have not used any canonical tags, just because there is not definitive source explaining exactly when and why you should use them??? Am I the only one who feels this way?
Intermediate & Advanced SEO | | greenrushdaily0 -
Removing Parameterized URLs from Google Index
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases. However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical. The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example: A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and B. www.domain1.com/productname.cfm is ranked at #12. (yes, I know that upper case is bad. We fixed that too.) Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern. Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
Intermediate & Advanced SEO | | AMHC0 -
URL Question and Advice on Site Architecture
Good morning one and all, i have a specific question pertaining to my Domain Migration Website URL structure. I have a computer repair business that I am re branding and my question at this point is centrally focused on how to best handle my URL naming structure that will best suite my needs for my the Search Engines and also my customers UX while not looking SPAMMY I am a web developer and SEO and I am building a SILO Site Architecture in WordPress using Pages (not Posts) so no discussion is need on the Permalink structure. I am attaching several Images below of Screen Shots of the new site that I have designed so that you may look at them and see the Silo Architecture Layout in action for the most part. OK, here we go. Looking at the Silo Mast Head, we can see that the following Main Menu items each represent a specific Silo Theme Silo Theme # 1 - COMPUTER REPAIR Silo Theme # 2 - VIRUS REMOVAL Silo Theme # 3 - PHONE REPAIR Silo Theme # 4 - NETWORKING Silo Theme # 5 - DATA RECOVERY My specific question is, if /computer-repair/ is a main silo theme (WP -Parent Page) and /laptop-repair/ is a (Child Page) of Computer Repair is the following example below (the actual URL string) going to 'trigger' a SPAM signal to either the user or GOOGLE or both?? URL String: http://www.pcmedicsoncall.com/computer-repair/laptop-repair/ Here's another example with the VIRUS REMOVAL SILO http://www.pcmedicsoncall.com/virus-removal/malware-removal/ Seeing how computer repair is the main silo theme that cannot be changed in the URL Structure (it can) but I wont change it seeing how COMPUTER REPAIR is the single largest keyword phrase used by individuals when they are looking for computer repair. Secondly, - LAPTOP REPAIR is also a Keyword Phrase that that has HIGH search queries that I am trying to rank for and that too (ideally) should also not changed! How do I deal with this situation? Or, am I seeing this in a overly paranoid way? I currently have the site allowing only my IP Address so I am afraid that the screen shots below is all that I can do on this in lieu of actually visiting the Site Currently, I have my URL Structure where Wilmington NC immediately follows the targeted keyword phrase for the Silo Theme like below http://www.pcmedicsoncall.com/virus-removal-wilmington-nc/malware-removal/ The example above, - including the location after the keyword phrase does look much more attractive and breaks it up so it does not read SPAMMY and it will help with SEO but yet another problem exists using the location after the keyword phrase which I explain in detail Below. On top of doing a complete re-branding Domain Change I am actually going to be relocating myself and my business to Charlotte, NC at the end of the summer so I have serious doubts if using Wilmington NC within the URL structure would be a wise idea considering that I will be relocating and an internal 301 Redirect on a Newly Migrated site 2-3 months after the initial site migration and site setup may have some negative impact and confuse Google and compound the situation thus much further despite the fact that it would immediately help me bounce back up with my rankings after the migration process. Thoughts a suggestions on both explained scenarios please? I have asked this specif question once already but obviously people do not read my very detailed and well thought out questions. This can also be viewed here>http://www.seomoz.org/q/need-very-urgent-advice-on-wedsite-migration-questions-please#reply_150847> Thank you Sincerely, Marshall Thompson SEOMOZ-PC-MEDICS-ON-CALL-1.jpg SEOMOZ-PC-MEDICS-ON-CALL1.jpg
Intermediate & Advanced SEO | | MarshallThompson310