Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Product photo links appear in search results
Since I'm new, I don't have any information. Product photo links appear in the search results. Is this a problem? because when you click on the link, it does not take you to a page, but to a blank page with only a photo. What should I do for this situation?
Technical SEO | | UgurKarabulut0 -
Should I nofollow/noindex the outgoing links in a news aggregator website?
We have a news aggregator site that has 2 types of pages: First Type:
Technical SEO | | undaranfahujakia
Category pages like economic, sports or political news and we intend to do SEO on these category pages to get organic traffic. These pages have pagination and show the latest and most viewed news on the corresponding category. Second Type:
News headlines from other sites are displayed on the category pages. The user will be directed to that news page on the main site by clicking on a link. These links are outgoing links and we redirect them by JavaScript (not 301).
In fact these are our websites articles that just have titles (linked to destination) and meta descriptions (reads from news RSS). Question:
Should we have to nofollow/noindex the second type of links? In fact, since the crawl budget of websites is limited, isn't it better to spend this budget on the pages we have invested in (first type)?0 -
Do the terms in a website url drive search hits
I've tried to do a search on a few key words that I knew was on my landing page and I couldn't get Google to find it. So I thought maybe I needed to change my url to reflect a few the terms.
Technical SEO | | Toal0 -
Mobile website question
Hi Mozzers, A website I manage has a mobile friendly version of their main website and a /m version as well. I was wondering if anyone had any experience in the best way of handling this? Should we just get rid of the /m version and tag the mobile friendly version? Thanks!
Technical SEO | | KarlBantleman0 -
Websites that scroll....forever....
I'm seeing more and more websites that have all their content on one page. I came across www.otbthink.com today and am wondering if I'm missing something here. I ask this because I am trying to figure out how to link to their copy writing portion of their website. I have a client that is in their area and needs someone to do some copy writing work for them. All the content there is being served through css. The search engines will see it but how do you optimize for this sort of site? How do you link to a particular section of this site? Am I missing something obvious here? Thanks.
Technical SEO | | DarinPirkey0 -
Linking without loosing link equity.
Hi, I was wondering if anyone had a solution to linking without loosing link equity? From what I have read using 'no follow' on both internal and external links DOES NOT pass any equity across the link to the link target, but also, the latest thought goes that it DOES loose link equity (as if it were a FOLLOW' link). So is there a method of retaining link equity using another method? Thanks
Technical SEO | | James770 -
Should I promote each section of my website
Hi, i have a magazine website and i have been heavily promoting the main page of the site thinking that all the work i am doing for the main page which includes links and so on would then pass onto the rest of my site but i have a feeling this is not correct. Can anyone let me know if i should be concentrating on each section of the site and also on my articles should i be promoting these articles or let the search engines pick them up. I already use facebook and twitter to promote new articles but i would like to know if i should be doing more than this
Technical SEO | | ClaireH-1848860 -
External Links from own domain
Hi all, I have a very weird question about external links to our site from our own domain. According to GWMT we have 603,404,378 links from our own domain to our domain (see screen 1) We noticed when we drilled down that this is from disabled sub-domains like m.jump.co.za. In the past we used to redirect all traffic from sub-domains to our primary www domain. But it seems that for some time in the past that google had access to crawl some of our sub-domains, but in december 2010 we fixed this so that all sub-domain traffic redirects (301) to our primary domain. Example http://m.jump.co.za/search/ipod/ redirected to http://www.jump.co.za/search/ipod/ The weird part is that the number of external links kept on growing and is now sitting on a massive number. On 8 April 2011 we took a different approach and we created a landing page for m.jump.co.za and all other requests generated 404 errors. We added all the directories to the robots.txt and we also manually removed all the directories from GWMT. Now 3 weeks later, and the number of external links just keeps on growing: Here is some stats: 11-Apr-11 - 543 747 534 12-Apr-11 - 554 066 716 13-Apr-11 - 554 066 716 14-Apr-11 - 554 066 716 15-Apr-11 - 521 528 014 16-Apr-11 - 515 098 895 17-Apr-11 - 515 098 895 18-Apr-11 - 515 098 895 19-Apr-11 - 520 404 181 20-Apr-11 - 520 404 181 21-Apr-11 - 520 404 181 26-Apr-11 - 520 404 181 27-Apr-11 - 520 404 181 28-Apr-11 - 603 404 378 I am now thinking of cleaning the robots.txt and re-including all the excluded directories from GWMT and to see if google will be able to get rid of all these links. What do you think is the best solution to get rid of all these invalid pages. moz1.PNG moz2.PNG moz3.PNG
Technical SEO | | JacoRoux0