Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages
Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is Google not indexing my site?
I'm a bit confused as to why my site just isn't indexing on Google. Even if I type in my brand name, my social channels rank and there's no evidence of my website. I've followed all of the advice I've read and gone into webmaster tools and got the Wordpress yoast plug-in but nothing seems to be making a difference!One thing I've noticed, in Google Webmaster Tools it says "Couldn’t communicate with the DNS server." in site errors. I've called GoDaddy and they said that everything is fine. A bit frustrating. Trying to work out what my next steps should be but feeling a bit lost to be honest! Any help GREATLY appreciated!
Technical SEO | | j1066s0 -
Site Map
For a long time our site map used to be http://www.efurniturehouse.com/sitemap.xml recently our hosting company changed the site map to: http://www.efurniturehouse.com/xml-sitemap.ashx I went ahead and submitted the new site maps to both Google Webmaster and Bing. I submitted the Google one on Monday and it states PENDING. ( A day later this pending) I just submitted the map to Bing. I now have 2 site maps on each. 1)Is having 2 a problem Will they ignore the old site map or can we delete and if so when can we delete I appreciate your input Regards Tony www.eFurnitureHouse.com
Technical SEO | | OCFurniture0 -
How do you handle Wordpress sitemaps within your site?
I have a regular site map on my site and I also have a Wordpress site installed within it that we use for blog/news content. I currently have an auto-sitemap generator installed in Wordpress which automatically updates the sitemap and submits it to the search engines each time the blog is updated. The question I have (which I think I know the answer to but I just want to confirm) is do I have to include all of the articles within the blog in the main site's sitemap despite the Wordpress sitemap having them in there already? If I do include the articles in the main website's sitemap, they would also be in the Wordpress sitemap as well, which is redundant. Redundancy is not good, so I just want to make sure.
Technical SEO | | iresqkeith0 -
How can you get the right site links for your site?
Hello all, I have been trying to get Google to list relevant site links for my site when you type in our brand name, Loco2 or for when Loco2 comes up in a search result. Different things come up when you search Loco2 and Loco 2. We would like site links to look like how they do when you search Loco 2. However Loco2 is our brand name, NOT Loco 2. Does anyone know why Google is doing this and whether we can influence results? We have done as much as possible via Google webmaster, in terms of specifying the links we DO NOT want Google to list for Loco2. However, when you search "Loco2", results only show simple site links. Ideally what we want is: Loco2 to be recognised as the brand NOT Loco 2 The same results (substantial, identical) for Loco2 as for Loco 2 (think o2 and o 2) For the site links to reflect the main pages of our site (Times & Tickets, Engine Room forum etc.) Many thanks in advance! Anila
Technical SEO | | anilababla0 -
Merged old wordpress site to new theme and have crazy amount of 4xx and duplicate content that wasn't there before?
URL is awardrealty.com We have a new website that we merged into a new wordpress theme. I just crawled the site with my seomoz crawl tool and it is showing a ridiculous amount of 4xx pages (200+) and we cant find the 4xx pages in the sitemap or within wordpress. Need some help? Am i missing something easy?
Technical SEO | | Mark_Jay_Apsey_Jr.0 -
Does 301 redirecting a site multiple times keep the value of the original site?
Hi, All! If I 301 redirect site www.abc.com to www.def.com, it should pass (almost) all linkjuice, rank, trust, etc. What happens if I then redirect site www.def.com to www.ghi.com? Does the value of the original site pass indefinitely as long as you do the redirects correctly? Or does it start to be devalued at some point? If anyone's had experience redirecting a site more than once and they've seen reportable good/bad/neutral results, that would be very helpful. Thanks in advance! -Aviva B
Technical SEO | | debi_zyx0 -
UK and USA site versions
We have a UK site selling our product and we are due to appoint a reseller in the USA, they require a .com domain, which makes sense and they also would like to see American spellings etc and currency. also we feature heavily in pubs and they want this referred to as "bars" so there are a few tweaks here and there but mainly just slight variations on spelling and terminology. These are only minor adjustments to our current site, what is the best way of achieving this without falling foul of duplicate content issues.
Technical SEO | | IPIM0