Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unable to site crawl
Hi there, our website was revamped last year and Moz is unable to crawl the site since then. Could you please check what is the issue? @siteaudits @Crawlinfo gleneagles.com.my
Technical SEO | | helensohdg380 -
An article we wrote was published on the Daily Business Review, we'd like to post it on our site. What is the proper way?
Part 1
Technical SEO | | peteboyd
We wrote an article and submitted it to the Daily Business Review. They published the article on their website. We want to also post the article on our website for our users but we want to make sure we are doing this properly. We don't want to be penalized for duplicating content. Is this the correct way to handle this scenario written below? We added a rel="canonical" to the blog post (on our website). The rel="canonical" is set to the Daily Business Review URL where the article was originally published. At the end of the blog post we wrote. "This article was originally posted on The Daily Business Review." and we link to the original post on the Daily Business Review. Should we be setting the blog post (on our website) to be a "noindex" or rel="canonical" ? Part 2 Our company was mentioned in a number of articles. We DID NOT write those articles, we were only mentioned. We have also posted those same articles on our website (verbatim from the original article). We want to show our users that we have been mentioned in highly credited articles. All of these articles were posted on our website and are set to be a "noindex". Is that the correct thing to do? Should we be using a rel="canonical" instead and pointing to the original article URL? Thanks in advance MOZ community for your assistance! We tried to do the leg work of our own research for the answers but couldn't find the exact same scenario that we are encountering**.**0 -
One site per location or all under and umbrella site?
I am working on a project where we are re-branding lots (100+) existing local business under one national brand. I am wondering what we should do with their existing websites, they are generally fairly poor and will need re-designing to match the new brand but may have some residual links? 301 redirect the URL to the national site, e.g. nationalsite.com/localbusinessA? If so what should I look out for? Do I need to specifically redirect any pages that have links to them to the same pages on the new site? Or should I give them a new standalone website that they link back to the national brand site? More than likely this will be hosted on the same server and CMS as the main site just the URL will remain Do I need to make sure that any old URL's that had links to them are 301'd to the new pages? Many thanks for you advice.
Technical SEO | | BadgerToo0 -
Want to Target Mobile site for Google Mobile Version and Desktop Site for Google Desktop Version
I have ecommerce site with both mobile version and desktop version. Mobile version starts with m.example.com and full version starts with www.example.com I am using same content through out both site and using 301 redirection by detecting user agent vice-versa. My both sites are accessible to crawl by any google spider. I have submitted both sites's sitemap to GWT and mobile site having mobile sitemap xml, so google can easily recognize my mobile site. Is it going to help to rank my both sites as per my expectation? I need to rank for mobile site in Google mobile and ranking for desktop site in Google desktop version. Some of pages of my mobile site are started to appearing in Google desktop version. So how I can stop them to appear in Google desktop? Your comments are highly welcome.
Technical SEO | | Hexpress0 -
WordPress blog and XML sitemap
I have a friend that just spent 15K on a new site and believe it or not the developer did not incorporate a CMS into the site. If a WP blog is built and the URL is added to the site's XML sitemap, for all intensive purposes, would Google view this URL as part of the site in terms of overall number of links, referring domains etc.? The developer is saying that even if the WP URL is added to the XML sitemap, Google will not view this URL as part of the site domain. I cannot think of another way of adding unique content to the site unless the developer is paid to build new pages every month. If the WP blog is not part of the overall domain, then we're left with the URL simply pointing back to the domain with anchor text and such and not adding to the total number of links and RD... ANY THOUGHTS ON THIS WOULD BE GREATLY APPRECIATED! Thanks Mozzers!
Technical SEO | | hawkvt10 -
What is a fast Wordpress Host?
I have 8 or so sites hosted with Dreamhost and my main web site (http://www.nwrafting.com) is a Wordpress site hosted there as well. I'd like to move it to another hosts so that it isn't seen as related to my other sites that link to it. My other sites are good informational sites (example: http://www.caliriver.com) and are not doorway pages, but they do provide good links to my main business. This is a good idea to move my main sight, right? If so, can someone recommend a good host for me to put my one Wordpress site on? I'm looking for something that will load my pages fast. Please don't send me an affiliate link - I want to choose the best host not the one that pays the biggest commission. Thanks!
Technical SEO | | nwrafting0 -
EzineArticles WordPress Plugin
Any thoughts on the EzineArticles plugin for WordPress? I read that it provides "the ability to simultaneously publish new posts to the web and submit them as articles to EzineArticles.com" Could this lead to duplicate content penalties?
Technical SEO | | martyc0 -
What is with WordPress Dupe issues?
Hi, Just wondering if anyone can explain for me why it seems every tag that is entered in WP blog posts on a site creates a duplicate page (identified by ROGER and friends in SEOmoz crawl)? Obviously if you can offer a solution (apart from the extremely obvious "don't use tags") I would be immensely grateful. Thanks so much,
Technical SEO | | ShaMenz0