Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Switching site from http to https. Should I do entire site?
Good morning, As many of you have read, Google seems to have confirmed that they will give a small boost to sites with SSL certificates this morning. So my question is, does that mean we have to switch our entire site to https? Even simple information pages and blog posts? Or will we get credit for the https boost as long as the sensitive parts of our site have it? Anybody know? Thanks in advance.
Technical SEO | | rayvensoft1 -
Using category pages in Wordpress
In our niche we have one main keyword, which represents the entire category. We are using wordpress. I am trying to understand the best URL structure and wonder if the below is a good approach: http://domain.com/keyword This category page will be written to contain an article on the subject. The posts that are put into that category will subsequently appear on this page, below that article. Each of those posts would be targeting a related keyword. e.g. I would write a post which has, as the main target keyword: "MainKeyword training" and another post, which would be targeting "MainKeyword techniques" ... (and so on). Thanks for your advice. Andrew
Technical SEO | | seowhiskey0 -
Spamming and Wordpress
Hi, I have a Wordpress site for which I was ranking #1 for my main key phrase. Then I noticed that my site had plummeted in ranking. Investigating I found the cause to be a hacking issue where my code has lots of content for and backlinks to Viagra sites! How do I best work on retrieving my ranking and making sure that the site in question gets penalized?
Technical SEO | | vibelingo0 -
Best way to retain banklink values when moving site?
Hi all, I want to get some opinions on what the best practice is when transferring backlink values from an old site to a new one. On the old site, I currently have a product page and this particular product has multiple models all listed on the one singe page. However on the new site, every model of this particular product has its own page. These product model pages would have relatively similar content apart from several key details which differentiates the models. Firstly would you guys recommend this splitting of models of the same product to different pages? If so, my initial thought process is to 301 redirect the old product page to the new model page that is most popular, and adding rel canonical tags to the other model pages. Would you consider this best practice? Or are there better ways I can be doing this to retain backlink values without also getting penalised due to possible content duplication? Thanks! Jac - sent from my manager's account.
Technical SEO | | RuchirP0 -
Ranking Multi-Language Site
Recently we updated our website to a new version. Our website has a structure in which the English page is our main page with about 50 subpages. All these pages are translated in 5 different languages. The different languages are divided into folders. For example www.ourdomain.com/de containts all german pages. The pages with products would be for example: www.ourdomain.com/products for english and www.ourdomain.com/de/produkte for the german page. On our previous website this used to be simililar. After the website update the SEOMoz crawls are showning duplicated page content/title errors for the pages saying that the pages in other languages have the same content/title as the basis English webpage. Any idea how I can solve these errors?
Technical SEO | | Exp0 -
Will training videos available on the "members only" section of a site contribute to the sites ranking?
Hello, I got asked a question recently as to whether training videos on the deeper pages of a website (that you can only access if you are a member and log in) will help with the sites ranking. On the SEOMoz software these deeper pages have been crawled as far as I can tell with errors reported on pages from the "members only" section of the site, leading me to believe the members only pages and their content will contribute to the sites overall ranking profile. I have suggested uploading the informational videos on the main pages of the site for now, making them accessible to all visitors and putting them in a more obvious place to encourage more sharing and views, however I've also said I would check it out with some experts so any information will be greatly appreciated! Many thanks 🙂 Charlotte
Technical SEO | | CharlotteWaller0 -
How to setup tumblr blog.site.com to give juice to site.com
Is it possible to get a subdomain blog.site.com that is on tumblr to count toward site.com. I hoped I could point it in webmaster tools like we do www but alas no. Any help would be greatly appreciated.
Technical SEO | | oznappies0