Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages
Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Yoast and wordpress duplicate meta
I'm using the Yoast plugin with wordpress and have noticed in my HTML I have duplicate meta data. For example my header starts with
Technical SEO | | simonatkinsphoto
<title>(title) </title<span><<br /><meta </span><span class="html-attribute-name">property</span><span>="</span><span class="html-attribute-value">og:site_name</span><span>" </span><span class="html-attribute-name">content</span><span>=<br /><span><meta </span><span class="html-attribute-name">property</span><span>="</span><span class="html-attribute-value">og:description</span><span>" </span><span class="html-attribute-name">content</span><span>=<br /><br /></span></span>Then I have the 'This site is optimised by Yoast" tagline followed by the same meta -<br /> <span><meta </span><span class="html-attribute-name">name</span><span>="</span><span class="html-attribute-value">description</span><span>" </span><span class="html-attribute-name">content=<br /><span> <meta </span><span class="html-attribute-name">property</span><span>="</span><span class="html-attribute-value">og:title</span><span>" content=<br /><span> <meta </span><span class="html-attribute-name">property</span><span>="</span><span class="html-attribute-value">og:description</span><span>" </span><span class="html-attribute-name">content=<br /><span> <meta </span><span class="html-attribute-name">property</span><span>="</span><span class="html-attribute-value">og:site_name</span><span>" </span><span class="html-attribute-name">content</span><span>=<br /><br /></span></span></span></span>Is this likely to cause problems with Google and is there a way to stop both wordpress and Yoast adding meta to the header. </p></title>0 -
Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks
Technical SEO | | MarketHubb
RewriteEngine On
RewriteBase /
Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1 -
Mobile site ranking instead of/as well as desktop site in desktop SERPS
I have just noticed that the mobile version of my site is sometimes ranking in the desktop serps either instead of as well as the desktop site. It is not something that I have noticed in the past as it doesn't happen with the keywords that I track, which are highly competitive. It is happening for results that include our brand name, e.g '[brand name][search term]'. The mobile site is served with mobile optimised content from another URL. e.g wwww.domain.com/productpage redirects to m.domain.com/productpage for mobile. Sometimes I am only seen the mobile URL in the desktop SERPS, other times I am seeing both the desktop and mobile URL for the same product. My understanding is that the mobile URL should not be ranking at all in desktop SERPS, could we be being penalised for either bad redirects or duplicate content? Any ideas as to how I could further diagnose and solve the problem if you do believe that it could be harming rankings?
Technical SEO | | pugh0 -
Where does Wordpress store the 301 redirects?
Hi, I've just created a campaign for my new wordpress blog and found 11 301 redirects which I was not aware of. It looks like wordpress has created them automatically. Does any one know how wordpress handles this issues or where are they stored so I can delete them? They are of no use for me. 9 of these redirects point to the same url with an added '/' and are in pages 1 is on a post. I've been changing the permalink and some urls several times and maybe one of these times the Wordpress has automatically created the 301 redirect. But why? I do not want to keep the old url. the last redirect is very strange it goes from http://www.mydomain.com/folder to http://www.mydomain.com where folder is the folder where I installed wordpress. But again, I want no one to type the url with the folder name or even know this folder exists. Any comment on this would be greatly appreciated. Thanks a lot, David
Technical SEO | | dballari0 -
Sitefinity vs Wordpress
We're looking for a new CMS and out development company suggested Sitefinity. I've had great success with Wordpress. Is either system better. I love worpdress but have had no experience with Sitefinity. Thanks!
Technical SEO | | StandUpCubicles0 -
Google.ca is showing our US site instead of our Canada Site
When our Canadian users who search on google.ca for our brand (e.g. Travelocity, Travelocity hotels, etc.), the first few results our from our US site (travelocity.com) rather than our Canadian site (travelocity.ca). In Google Webmaster Tools, we've adjusted the geotargeting settings to focus on the appropriate locale, but the wrong country TLD is still coming up at the top via google.ca. What's the best way to ensure our Canadian site comes up instead of the US site on google.ca? Thanks, Tory Smith
Technical SEO | | travelocitysearch
Travelocity0 -
What are the pros and cons of moving one site onto a subdomain of another site?
Two sites. One has weaker sales. What would the benefits and problems for SEO of moving the weak site from its own domain to a subdomain of the stronger site?
Technical SEO | | GriffinHansen0 -
Index forum sites
Hi Moz Team, somehow the last question i raised a few days ago not only wasnt answered up until now, it was also completely deleted and the credit was not "refunded" - obviously there was some data loss involved with your restructuring. Can you check whether you still find the last question and answer it quickly? I need the answer 🙂 Here is one more question: I bought a website that has a huge forum, loads of pages with user generated content. Overall around 500.000 Threads with 9 Million comments. The complete forum is noindex/nofollow when i bought the site, now i am thinking about what is the best way to unleash the potential. The current system is vBulletin 3.6.10. a) Shall i first do an update of vbulletin to version 4 and use the vSEO tool to make the URLs clean, more user and search engine friendly before i switch to index/follow? b) would you recommend to have the forum in the folder structure or on a subdomain? As far as i know subdomain does take lesser strenght from the TLD, however, it is safer because the subdomain is seen as a separate entity from the regular TLD. Having it in he folder makes it easiert to pass strenght from the TLD to the forum, however, it puts my TLD at risk c) Would you release all forum sites at once or section by section? I think section by section looks rather unnatural not only to search engines but also to users, however, i am afraid of blasting more than a millionpages into the index at once. d) Would you index the first page of a threat or all pages of a threat? I fear duplicate content as the different pages of the threat contain different body content but the same Title and possibly the same h1. Looking forward to hear from you soon! Best Fabian
Technical SEO | | fabiank0