Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search visibility of website that only uses H2 tags - will not having H1 damage my visibility?
Excuse the basic question. I host my domain and website on Squarespace. I use a specific theme and after doing a site crawl of my site Moz picked up that Pages and Blog posts 'Missing or Invalid H1' tags (450 issues!). I discovered that my Squarespace theme only using H2 tags. Is this a serious issue that affects my search visibility? What would you recommend that I do to fix this, if anything? I'm starting some SEO and lnikbuilding, but wanted to see if this is an issue that I need to consider. Thanks!!!!
Technical SEO | | twofourseven0 -
Is the content on my website is garbage?
I received a mail from google webmasters, that my website is having low quality content. Website - nowwhatmoments.com
Technical SEO | | Green.landon0 -
Transferring link juice on a page with over 150 links
I'm building a resource section that will probably, hopefully, attract a lot of external links but the problem here is that on the main index page there will be a big number of links (around 150 internal links - 120 links pointing to resource sub-pages and 30 being the site's navigational links), so it will dilute the passed link juice and possibly waste some of it. Those 120 sub-pages will contain about 50-100 external links and 30 internal navigational links. In order to better visualise the matter think of this resource as a collection of hundreds of blogs categorised by domain on the index page (those 120 sub-pages). Those 120 sub-pages will contain 50-100 external links The question here is how to build the primary page (the one with 150 links) so it will pass the most link juice to the site or do you think this is OK and I shouldn't be worried about it (I know there used to be a roughly 100 links per page limit)? Any ideas? Many thanks
Technical SEO | | flo20 -
Is there any SEO benefit to pulling a picture from another website and linking to it from a blog?
For example, if blog.mountainmedia.com were to link a product picture directly to mountainmedia.com. Would this be considered a high quality backlink?
Technical SEO | | MountainMedia0 -
404's in WMT are old pages and referrer links no longer linking to them.
Within the last 6 days, Google Webmaster Tools has shown a jump in 404's - around 7000. The 404 pages are from our old browse from an old platform, we no longer use them or link to them. I don't know how Google is finding these pages, when I check the referrer links, they are either 404's themselves or the page exists but the link to the 404 in question is not on the page or in the source code. The sitemap is also often referenced as a referrer but these links are definitely not in our sitemap and haven't been for some time. So it looks to me like the referrer data is outdated. Is that possible? But somehow these pages are still being found, any ideas on how I can diagnose the problem and find out how google is finding them?
Technical SEO | | rock220 -
Homepage not showing for searches
Hi Looking for a bit advice our client - www.financial-wise.co.uk We worked with this client on his old website www.mortgage-wise.co.uk we had him ranking for most local searches. The client then re-branded the full company and got another company in to do this, they did him new website www.financial-wise.co.uk, the company then launched the new domain with the old one still showing, some of the content was the same, including homepage. Anyway the issue im having now is, for certain searches im struggling to get them ranking again and for searches such as financial wise, its inner pages showing instead of the homepage? I have che checked and homepage is indexed, i have also re-written all the text on the homepage but still having some issue, its almost like the homepage has been penalised any help would be great?
Technical SEO | | rfksolutionsltd0 -
How to do a no follow on site search
We have a site search that is causing a huge amount of errors as the SEOmoz crawler is showing these as duplicate content. Our first thought was to do a no-follow on the site-search directory, but we realized that the site search is /site-search.aspx and URl strings appear at the end for hundreds of pages. How dow we/how can we no-follow an undetermined amount of URL strings?
Technical SEO | | Apptixweb0 -
A Puzzling Link
I'm stumped and I'm hoping some mozzers will be able to help. I run our company blog (http://scottymacblog.com/). The last couple of days I have noticed that the blog is receiving some traffic from cnn.com. I looked, but cannot find any mention of the blog on cnn. Adding to my frustration is that the content on cnn is constantly changing. Our blog doesn't do any sort of advertising and no one affiliated with the blog posts on cnn. As great as it is to be getting traffic from such a valued source, I have no idea why. Has something like this happened to (for?) anyone else? Any ideas on how I can research the source of the link? Thanks in advance!
Technical SEO | | EssEEmily0