Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No english Url = No sense symbols?
Hey there, i have a greek content website and some of the urls are greek (I did this for better SEO score).
Moz Pro | | tsalatzi
When i am using the analyze page issues and write down the greek url it doesnt find it (for example if i wrote down "www.euroulakia.com/πως-να-βγαλω-λεφτα" it displays me back "Sorry! We weren't able to find that page when we crawled your site") BUT when i just copy paste it from the url the moz finds it. However when i copy-paste the url changes the greek characters to no-sense symbols (for example the same above url becomes : http://www.euroulakia.com/πως-να-βγαλω-λεφτα) As you can see the url is written with non-sense symbols.. My question is if google see this no-sense symbol as well instead of the greek characters? I am using Joomla and i have: Search Engine Friendly URLs and Unicode Aliases setting to yes. Can anyone please help me with this because i have a feeling that something is wrong here. Thanks in advance0 -
URL, Subdomain and Root Domain Structure
Various URL Structure
Moz Pro | | Mark_Ch
mydomain.co.uk
www.mydomain.co.uk
http://www.mydomain.co.uk
http://mydomain.co.uk
mydomain.co.uk/index.html
www.mydomain.co.uk/index.html
http://www.mydomain.co.uk/index.html
http://mydomain.co.uk/index.html HTACCESS File Index Rewrite RewriteRule ^index.(htm|html|php) http://www.mydomain.co.uk/ [R=301,L]
RewriteRule ^(.)/index.(htm|html|php) http://www.mydomain.co.uk/$1/ [R=301,L]
RewriteCond %{HTTP_HOST} ^mydomain.co.uk
RewriteRule ^(.)$ http://www.mydomain.co.uk/$1 [R=301,L] Google WMT Setting: Configuration | Settings
Preferred domain: radio check on "don't set a preferred domain" SEOMoz Open Site Explorer
mydomain.co.uk - (301 Redirect) [No Data] PA38 DA30
http://www.mydomain.co.uk/index.html - (301 Redirect) [No Data] PA23 DA30 Majestic Site Explorer
Number of Referring Domains & External Backlinks vary between the following instances:
URL: http://www.mydomain.co.uk
SUBDOMAIN: www.mydomain.co.uk
ROOT DOMAIN: mydomain.co.uk
Question
I have set up my htaccess file to rewrite "Various URL Structure" to www.mydomain.co.uk. However when i view metrics in Majestic SEO, the url / Subdomain / Root Domain all differ. Why is this happening?
Is this harming my site?
What is common practice when defining URL Structure? Any other quality advise and implementation structure would be much appreciated. Regards Mark0 -
Does SeoMoz realize about duplicated url blocked in robot.txt?
Hi there: Just a newby question... I found some duplicated url in the "SEOmoz Crawl diagnostic reports" that should not be there. They are intended to be blocked by the web robot.txt file. Here is an example url (joomla + virtuemart structure): http://www.domain.com/component/users/?view=registration and the here is the blocking content in the robots.txt file User-agent: * _ Disallow: /components/_ Question is: Will this kind of duplicated url errors be removed from the error list automatically in the future? Should I remember what errors should not really be in the error list? What is the best way to handle this kind of errors? Thanks and best regards Franky
Moz Pro | | Viada0 -
What software can I use on my Mac to open and read a SEOMoz CSV exported file?
I do not want to buy XL or Pages just to read the CSV from SEOMoz. So I bought an app on the AppStore... and this app is unable to read the CSV from SEOMoz. Since I already wasted $2, Id rather avoid to waste more (and avoid that to others too!). What software is recomanded to open these CSV files? Also, I tried Google Docs, but I bumped in their 400K cells limit 😞
Moz Pro | | jgenesto0 -
URLs getting re-directed to double http:// URLs
The "Notices" section under "Crawl Diagnostics" shows that there are 435 issues on my website. I checked out a few URLs to verify this issue and found that most of these pages are working perfectly. For instance, the above mentioned report shows that http://policycomplaints.com/about redirects to http://http://policycomplaints.com/about/ . Then, http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ redirects to http://http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ . However, when I open these pages, they seem to be working perfectly. I didn't find them getting re-directed to somewhere else. So, as per the report, it seems that all of these 435 http://URLs are getting re-directed to http://http://URL versions which in reality is not true because all the http://URLs are working perfectly. So, is this a problem with SEOmoz software? If not, what is the reason for these issues and how can I adddress them. Do notify if any further information is required for the same. Thanks. bNiEm.png
Moz Pro | | unknownID10 -
Can overly dynamic URLs be overcome with canonical meta tags?
I tried searching for questions regarding dynamic URLs and canonical tags, but I couldn't find anything s hopefully this hasn't been covered. There are a large number of overly dynamic URLs reported in our site crawl (>7,000). I haven't looked at each of these, but most of these either have a canonical meta tag or have are indicated as FOLLOW, NO INDEX pages. Will these be enough to overcome any negative SEO impact that may come from overly dynamic URLs? We are down to almost 0 critical errors and this is now the biggest problem reported by the site crawl after too many on page links.
Moz Pro | | afmaury0 -
What is the quickest way to get OSE data for many URLs all at once?
I have over 400 URLs in a spreadsheet and I would like to get Open Site Explorer data (domain/page authority/trust etc) for each URL. Would I use the Linkscape API to do this quickly (ie not manually entering every single site into OSE)? Or is there something in OSE or a tool I am overlooking? And whatever the best process is, can you give a brief overview? Thanks!! -Dan
Moz Pro | | evolvingSEO0 -
Problems with OSE downloads
Ordered 5 reports last 24 hours, none received. Anyone else with this problem ? I do expect better from an expensive subscription. C'mon Moz, fix this new OSE report system please.
Moz Pro | | blocker04082