SEO-Friendly Method to Load XML Content onto Page
-
I have a client who has about 100 portfolio entries, each with its own HTML page.
Those pages aren't getting indexed because of the way the main portfolio menu page works: It uses javascript to load the list of portfolio entries from an XML file along with metadata about each entry. Because it uses javascript, crawlers aren't seeing anything on the portfolio menu page.
Here's a sample of the javascript used, this is one of many more lines of code:
// load project xml try{ var req = new Request({ method: 'get', url: '/data/projects.xml',
Normally I'd have them just manually add entries to the portfolio menu page, but part of the metadata that's getting loaded is project characteristics that are used to filter which portfolio entries are shown on page, such as client type (government, education, industrial, residential, industrial, etc.) and project type (depending on type of service that was provided). It's similar to filtering you'd see on an e-commerce site. This has to stay, so the page needs to remain dynamic.
I'm trying to summarize the alternate methods they could use to load that content onto the page instead of javascript (I assume that server side solutions are the only ones I'd want, unless there's another option I'm unaware of). I'm aware that PHP could probably load all of their portfolio entries in the XML file on the server side. I'd like to get some recommendations on other possible solutions. Please feel free to ask any clarifying questions.
Thanks!
-
As a response to my own question, I received some other good suggestions to this issue via Twitter:
- @__jasonmulligan__ suggested XSLT
- @__KevinMSpence__ suggested "...easiest solution would be to use simplexml --it's a PHP parser for lightweight XML" & "Just keep in mind that simplexml loads the doc into memory, so there can be performance issues with large docs."
- Someone suggested creating a feed from the XML, but I don't think that adds a ton of benefit aside from another step, since you'd still need a way to pull that content on to the page.
- There were also a few suggestions for ways to convert the XML feed to another solution like JSON on the page, but those were really outside the scope of what we were looking to do.
Final recommendation to the client was to just add text links manually beneath all of the Javascript content, since they only were adding a few portfolio entries per year, and it would look good in the theme. A hack, perhaps, but much faster and cost-effective. Otherwise, would have recommended they go with PHP plus the simplexml recommendation from above.
-
Think you need to find a developer who understand progressive enhancement so that the page degrades gracefully. You'll need to deliver the page using something server-side (php?) and then add the bells and whistles later.
I'm guessing the budget won't cover moving the entire site/content onto a database/cms platform.
How does the page look in Google Webmaster Tools - (Labs, Instant Preview). Might give you a nice visual way to explain the problem to the client.
-
Site was done a year or two ago by a branding agency. To their credit, they produced clean and reasonably-well documented code, and they do excellent design work. However, they relied too heavily on Flash and javascript to load content throughout the site, and the site has suffered as a result.
Site is entirely HTML, CSS, & Javascript and uses Dreamweaver template files to produce the portfolio entry pages, which then propagate into the XML files, which then get loaded by the rest of the site.
I wouldn't call it AJAX - I think it loads all of the XML file and then uses the filters to display appropriate content, so there are no subsequent calls to the server for more data.
User interface is great, and makes it easy to filter and sort by relevant portfolio items. It's just not indexable.
-
What's the reason it was implemented this way in the first place? Is the data being exported from another system in a particular way?
What's the site running on - is there a CMS platform?
Is it javascript because it's doing some funky ajax driven "experience" or are they just using javascript and the xml file to enable you to filter/sort based on different facets?
Final silly question - how's the visitor expected to interact with them?
-
Try creating an XML sitemap with all the entries, spin that into an HTML sitemap version and also a portfolio page with a list of entries by type. It's a bit of work, but will probably work best.
-
Thanks Doug,
I forgot to mention it above, but I am definitely mentioning other workaround methods of getting the content indexed, specificallly:
- XML Sitemap
- Cross-linking - there's plenty of other opportunities to link throughout the site that haven't been done yet - so that's high on the list.
- Off-site deep link opportunities are also large and will be addressed.
- The projects aren't totally linear, so we can't use next/previous in this example, but that's a good idea as well.
Those aside, there is a fundamental issue with the way the data is working now and I want to address the ideal solution, since it's within the client's budget to have that content redesigned properly.
-
While helpfully not answering the question, could you generate a xml sitemap (I take it the portfolio data is being generated from something?) to help Google find and index the pages?
Is there any cross linking between the individual portfolio pages or at least a next/previous?
(My first thought would have been the php route.)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Archive pages structure using a unique hierarchical taxonomy, could be good for SEO?
Hi, Preamble:
Intermediate & Advanced SEO | | danielecelsa
We are creating a website where people look for professionals for some home working. We want to create a homepage with a search bar where people write the profession/category (actually it is a custom taxonomy) that they need, like ‘plumbers’, and a dropdown/checkbox filter where they can choose the city where they need the plumber.
The result page is a list of plumber agencies in the city chosen. Each agency is a Custom Post Type for us. Furthermore, we are hardly working to make our SEO ranking as high as possible.
So, for example, we know that it is important to have a well-done Archive Page for each Taxonomy term, besides a well-done Results Page.
Also, we know it is bad for SEO to have duplicated pages or (maybe) similar pages, ranking for the same (or maybe also similar) keywords. Proposed Structure:
So, what we are thinking is to have this structure:
A unique hierarchical taxonomy that INCLUDES the City AND the profession! That means that our taxonomy ‘taxonomy_unique’ has terms like: ‘Rome’, ‘Paris’, ‘Dublin’ as father and also terms like ‘Plumbers’, ‘Gardeners’, ‘Electricians’ which are sons of some City father! So we will have the term 'Plumbers' son of 'Rome' and we will have also the term 'Plumbers' son of 'Paris'. Each of these two taxonomy terms (Rome/Plumbers and Paris/Plumbers) will have an archive page that we want to make ranking for the keywords ‘Plumbers in Rome’ and ‘Plumbers in Paris’ respectively. It is easier to think of it imagining the breadcrumbs. They will be:
Home > Rome > Plumbers
and
Home > Paris > Plumbers Both will have: a static content (important for SEO), where we describe the plumber profession with a focus on the city, like ‘Find the best Plumbers in Rome’ vs ‘Find the best Plumbers in Paris' a 'dynamic' content - below - that is a list of Custom Post Types which have that taxonomy term associated. Furthermore, also 'Rome' and 'Paris' are taxonomy terms that have their own archive page. In those pages, we are thinking to show the Custom Post Types (agencies) associated with that taxonomy term as a father OR maybe just a list of the 'sons' of that father, so links to those archive pages 'sons').
In both cases, there should be also a static content talking maybe about the city and the professionals it offers in general. Questions:
So what we would like to understand is: Is it bad from an SEO perspective to have 2 URLs that look like this:
www.mysite.com/Rome/Plumbers
and
www.mysite.com/Naples/Plumbers
where the static content is really similar and it is something like that:
“Are you looking for the best plumbers in the city of Rome”
and
“Are you looking for the best plumbers in the city of Naples”? Also, these kinds of pages will be much more than 2, one for each City.
We are doing that because we want the two different pages to rank high in two different cities, but we are not sure if Google likes that. On the other hand, each City will have one page for each kind of job, so:
www.mysite.com/Rome/Plumbers
www.mysite.com/Rome/Gardeners
www.mysite.com/Rome/Electricians
So the same question, does Google like this or not? About 'Rome' and 'Paris' archive pages, does Google prefer a list of Custom Post Types that have that father term associated as taxonomy, or a list of the archive pages 'sons', with links to those pages? What do you think about this approach? Do you think this structure could be good from an SEO perspective, or maybe there could be something better alternatively? Hoping everything is clear, we really appreciate anyone dedicating its time and leaving feedback.
Daniele0 -
Would You Redirect a Page if the Parent Page was Redirected?
Hi everyone! Let's use this as an example URL: https://www.example.com/marvel/avengers/hulk/ We have done a 301 redirect for the "Avengers" page to another page on the site. Sibling pages of the "Hulk" page live off "marvel" now (ex: /marvel/thor/ and /marvel/iron-man/). Is there any benefit in doing a 301 for the "Hulk" page to live at /marvel/hulk/ like it's sibling pages? Is there any harm long-term in leaving the "Hulk" page under a permanently redirected page? Thank you! Matt
Intermediate & Advanced SEO | | amag0 -
More content, more backlinks, more appealing design, better SEO than competitor but ranking worse. Why?
Me and my team are working hard to put a good SEO strategy in place and good, useful content for our visitors. We're online for a little more than a year and we got some success but nothing compared to the majority of our competitors. The only thing we noticed so far is that our strongest competitors all seem to have a website based on WordPress... well is it possible to rank better than a site based on WP if we optimize everything well? One of our competitor is a very simple, thin affiliate website based on WP and is called www.comparatif-meilleur.fr ... their content is limited but they do better than us in term of ranking organically... On our side (www.lecomparateur.net) we have backlinks from Wikipedia, Dmoz and a couple of other very good sites, plus a better content, better design and better overall optimization. There must be something we do bad, but we can't find it. Please help 😞 P.S. We update our site quite often too
Intermediate & Advanced SEO | | benoit_20180 -
Web accessibility - High Contrast web pages, duplicate content and SEO
Hi all, I'm working with a client who has various URL variations to display their content in High Contrast and Low Contrast. It feels like quite an old way of doing things. The URLs look like this: domain.com/bespoke-curtain-making/ - Default URL
Intermediate & Advanced SEO | | Bee159
domain.com/bespoke-curtain-making/?style=hc - High Contrast page
domain.com/bespoke-curtain-making/?style=lc - Low Contrast page My questions are: Surely this content is duplicate content according to a search engine Should the different versions have a meta noindex directive in the header? Is there a better way of serving these pages? Thanks.0 -
SEO effect of content duplication across hub of sites
Hello, I have a question about a website I have been asked to work on. It is for a real estate company which is part of a larger company. Along with several other (rival) companies it has a website of property listings which receives a feed of properties from a central hub site - so lots of potential for page, title and meta content duplication (if if isn't already occuring) across the whole network of sites. In early investigation I don't see any of these sites ranking very well at all in Google for expected search phrases. Before I start working on things that might improve their rankings, I wanted to ask some questions from you guys: 1. How would such duplication (if it is occuring) effect the SEO rankings of such sites individually, or the whole network/hub collectively? 2. Is it possible to tell if such a site has been "burnt" for SEO purposes, especially if or from any duplication? 3. If such a site or the network has been totally burnt, are there any approaches or remedies that can be made to improve the site's SEO rankings significantly, or is the only/best option to start again from scratch with a brand new site, ensuring the use of new meta descriptions and unique content? Thanks in advance, Graham
Intermediate & Advanced SEO | | gmwhite9991 -
Is it possible to have good SEO without links and with only quality content?
Is it possible to have good SEO without links and with only quality content? Have you any experience?
Intermediate & Advanced SEO | | Alex_Moravek2 -
Loading Content Asynchronously for Page Speed Purposes?
Pages for my companies play process load slowly because the process is heavy. Below the play process there is a block of text, put mostly there for SEO purposes. R&D are proposing to load the SEO Area only after the play process is loading.
Intermediate & Advanced SEO | | theLotter
This seems like a very bad solution, because loading the SEO Area asynchronously will make the content unreadable to Google. Am I missing something?0 -
Can I delay an AJAX call in order to hide specific on page content?
I am an SEO for a people search site. To avoid potential duplicate content issues for common people searches such as "John Smith" we are displaying the main "John Smith" result above the fold and add "other John Smith" search results inside an iframe. This way search engines don't see the same "other John Smith" search results on all other "John Smith" profile pages on our site and conclude that we have lots of duplicate content. We want to get away from using an iframe to solve potential duplicate content problem. Question: Can we display this duplicate "John Smith" content using a delayed AJAX call and robot.txt block the directory that contains the AJAX call?
Intermediate & Advanced SEO | | SEOAccount320