Mod rewrite question
-
Sorry in advance if this isn't the best place to ask this question.
Google Webmaster Tools has recently identified a ton of "Not Found" pages, which are actual pages with some digits appended at the end.
For example, suppose an actual page on my blog is:
(A) http://www.example.com/blog/2012/09/my-post-title/
This page works just fine.
However, GWT has identified the following page as a "not found" page:
(B) http://www.example.com/blog/2012/09/my-post-title/9157586677/1846732913010
This appears to be happening to hundreds of posts on my site. In each case, the "9157586677" portion of the URL is identical, but the remaining 13 digits change from page to page.
I haven't been able to determine exactly what is causing this to happen - it's probably a social plug-in for Wordpress, or perhaps Disqus, but I'm not sure which one. I'll go through a process of elimination to narrow it down over the coming week.
As a quick fix, I'd like to create a ModRewrite rule so that requests for (B) get 301 redirected to (A). Since there are hundreds of posts, I need to do this in a way that works regardless of what's in the "/2012/09/my-post-title/" part of the URL.
Unfortunately, mod-rewrite is outside of my area of expertise. Can somebody please suggest how I can handle this? Thanks in advance.
PS - As for tracking down the cause, I've looked at the source of the pages in the "Linked From" area of GWT and the Not Found link is nowhere to be found. That is why I assume the bad link is being generated by some javascript that is a part of one of my plug-ins.
Update: It seems like Disqus is the source of these phantom links. There's considerable discussion here. I'll continue searching for a long-term solution. Meanwhile, I'd still appreciate help with the mod-rewrite question above. Thanks again.
-
I've found a solution and am posting it here in case anybody else is having the same problem:
RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /blog/$1/$2/$3/ [L,R=301]
-
I hadnt seen the update over Disquss at the end of the post.
Please, post all your advances on this topic Ahirai
Best regards!
-
Hi ahirai,
I was gonna say you should check the linked from tab in GWT but since you actually did it, for me its pretty sure that a plugin that drives content is creating this issue from scratch.
Since i´m neither an apache expert, i can´t give you a method to do the dirty work, but i can tell you the problem is created by some 3rd party plugin driving content of site.
Please, post your advances in the topic!
Good luck!!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
PDF Optimization Question: Does URL Structure Matter?
Hi Mozzers: I am optimizing a bunch of PDF brochures within a client's website. Besides the typical optimization tactics I'm applying, (like these) I have a question regarding the file/url structure of the PDFs themselves. By default, the client is locating PDFs in an 'uploads' folder of their Wordpress site. So, a typical PDF might have a URL such as: https://www.Xyzinsurance.com/xyz-content/uploads/2015/06/Brochure-XYZ-Connect.pdf My question: is there any advantage in eliminating all these sub-directories and moving the files into a main folder, simply titled '/brochures' ?? Any insights or conjecture would be welcome!
Technical SEO | | Daaveey0 -
Questions About The Right Hosting
Hi All, I have a few questions about the right type of hosting that I should be using. I understand that many people say we should be using the best hosting that we can afford. However, when I have a website with just 650 pages / posts is it really worth worrying too much about where I am hosting. I am UK based so at the moment I am using a UK host along with a CDN. I have a unique IP address and on a server that has a limited amount of websites on it. The main question is there really any need to be looking at anything else. The truth is I have used cloud hosting before and the website loaded slower around the world with that than it does with my current setup. Thanks
Technical SEO | | TTGUK0 -
Question about breaking out content from one site onto many
We have a website and domain -- which is well-established (since 1998) -- that we are considering breaking apart for business reasons. This is a content site that hosts articles from a few of our brands in portal fashion. These brands are represented in print with their own magazines so it's important to keep their presence separate. All of the content on the site is related to a general industry, with each brand covering a unique segment in the industry. For example, think of a toy industry site that hosts content from it's brands covering stuffed animals, electronics and board games. The current thinking is to break out the content from a couple brands to their own sites and domains. The business case for this branding purposes. I'm of the opinion that this is a bad idea as we would likely see a noticeable decline in search traffic across the board, which we rely on for impressions for our advertisers. If we take the appropriate steps to carefully redirect pages to the new domains what kind of hit should we expect to take from this transition? Would it make much difference if we were transition from 1 to 2 sites vs 1 to 4? Should this move be avoided all together? Any advise would be appreciated.
Technical SEO | | accessintel0 -
Google Enterprise Search Questions
Hi Everybody, A client has asked me to take a look at Google Enterprise Search for them. It has been a few years since I last fooled around with implementing a Google search box on a website, and that was the free version which included off-site results in the results. This appears to be the main page describing the paid product: http://www.google.com/enterprise/search/ I have three questions: The search testing function on the above page doesn't seem to be working. I'm typing in a URL and search term, as prompted, and the page is simply refreshing. It never provides me an example set of results. Is it working for you? This client has a moderately large e-commerce site (about 200 products). Have you implemented Google enterprise search on such a site and are you happy with its performance? The goal here is to let users search for a topic and be returned both product and informational pages. How well does this tool do this? Am I going to need to know any special types of coding (beyond html/css) to implement this? If so, what are they? If you have experience with this product, I would surely appreciate your feedback. Thank you!
Technical SEO | | MiriamEllis0 -
Basic Multi-Site Question
Newb question. We run a site in multiple cities under the same domain. Often times one city will provide content that is "syndicated" to other cites. For example, here is the master post: http://www.styleblueprint.com/food-and-entertaining/kale-salad-quick-healthy/ The content will also show up in the following domains: http://atlanta.styleblueprint.com/food-and-entertaining/kale-salad-quick-healthy/ http://birmingham.styleblueprint.com/food-and-entertaining/recipes/kale-salad-quick-healthy/ Should I be marketing the posts in Atlanta and Birmingham as "no index, no follow" for SEO purposes? Thanks in advance, Jay
Technical SEO | | SSBCI0 -
Canonical tags/wordpress permalink question
Need help: Do canonical tags do the exact same thing that wordpress already does with it’s permalink function? Or are these 2 separate things? thank you.
Technical SEO | | bonnierSEO1 -
Using hyphenated sub-domains or non-hyphenated sub-domains? What is the question! I Any takers?
For our corporate business level domain, we are exploring using a hyphenated sub-domain foir a project. Something like www.go-figure.extreme.com I thought from a user perspective it seems cluttered. The domain length might also be an issue with the new Algorithm big G has launched in recent past. I know with past experience, hyphenated domains usually take longer to index, as they are used by spammers more frequently and can take longer to get out of the supplementary index. Our company site has over 90 million viewers / year, so our brand is well established and traffic isn't an issue. This is for a corporate level project and I didn't have the answer! Will this work? anyone have any experience testing this. Any thoughts will help! Thanks, Rob
Technical SEO | | RobMay0 -
Pages not ranking - Linkbuilding Question
It has been about 3 months since we made some new pages, with new, unique copy, but alot of pages (even though they have been indexed) are not ranking in the SERPS I tested it by taking a long snippet of the unique copy form the page and searching for it on Google. Also I checked the ranking using http://arizonawebdevelopment.com/google-page-rank
Technical SEO | | Impact-201555
Which may no be accurate, I know, but would give some indication. The interesting thing was that for the unique copy snippets, sometimes a different page of our site, many times the home page, shows up in the SERP'sSo my questions are: Is there some issue / penalty / sandbox deal with the pages that are not indexed? How can we check that? Or has it just not been enough time? Could there be any duplicate copy issue going on? Shouldn't be, as they are all well written, completely unique copy. How can we check that? Flickr image details - Some of the pages display the same set of images from flickr. The details (filenames, alt info, titles) are getting pulled form flickr and can be seen on the source code. Its a pretty large block of words, which is the same on multiple pages, and uses alot of keywords. Could this be an issue considered duplication or keyword stuffing, causing this. If you think so , we will remove it right away. And then when do we do to improve re-indexing? The reason I started this was because we have a few good opportunities right now for links, and I was wondering what pages we should link to and try to build rankings for. I was thinking about pointing one to /cast-bronze-plaques, but the page is not ranking. The home page, obviously is the oldest page, and ranked the best. The cast bronze plaques page is very new. Would linking to pages that are not ranking well be a good idea? Would it help them to get indexed / ranking? Or would it be better to link to the pages that are already indexed / ranking? If you link to a page that does not seem to be indexed, will it help the domains link profile? Will the link juice still flow through the site0