A suggestion to help with linkscape crawling and data processing
-
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea:
In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data?
Are there enough users of the data to make distributed computing worthwhile?
Perhaps those who crunched the most data each month could receive moz points or a free month of Pro.
I have submitted this as a suggestion here:
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home -
Sean - I share Rand' sentiments, thanks so much for the suggestion!
We have considered distributed crawling in the past (or even distributed rank checking because then it would be in that user's locale) but there are a whole different set of challenges. For example, you have to handle all the edge cases: what if a user's computer isn't on, or loses connectivity, what if we crawl too fast and the user gets blocked from a site, how do you write all that data securely?
Of course all of these concerns can be overcome, but right now we feel like we have a good handle on the problems, and it will be much faster for us to just fix what we have
Although, I know all of us are so appreciative of the ideas and support, and we will have something really great soon!
-
Thanks a ton Sean! We have considered distributed computing as a way to help crawl, index, process, etc. It's so flattering and humbling to hear that you'd be willing to help out and that the community would, too
For now, we believe we can get to the index size/quality/freshness using our hosted system, but the engineering team will certainly be encouraged to hear that folks in our community might contribute to this. Distributed systems present their own challenges, and we'd have to write that code from scratch, but if we find that we can't do what we want with our existing network, we might reach out.
BTW - I wanted to let folks know that the team here does feel very confident that come December/January, we're going to be producing indices that reach exceptional quality bars. The problems we face are largely known, and we now have the team and the solutions to tackle it, so we're pretty excited.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Site Crawl Stalled and Can't Restart
In my GreenSeed campaign, the site crawl continues to say "in progress." I can't figure out how to stop it or how to restart the site crawl. Can you please help?
Moz Pro | | Winger1 -
MOZ Starter Crawl Not Working
Hello, I just added a new subdomain as one of my campaigns on MOZ. The starter crawl report keeps coming back to me with just one page crawled (it should crawl up to 250 pages). I've deleted and added this subdomain three times and it continues to present me with this problem.I've even waited a week for the full crawl report but that also showed just one page crawled. Does anybody know why this is happening? Thanks!
Moz Pro | | jampaper0 -
Linkscape API Sort calls
I'm using Linkscape API plugin from SEO Gadget. It's all fine but I can't find any documentation on other 'sort' calls. The only request that I knows that works is "domains_linking_page". Any idea where this information is? I was hoping for a list of all the available requests I can and more particularly I want to sort it by Internal Linking Domains.
Moz Pro | | iProspect-397560 -
How often does SEOMoz refresh their link analysis data?
I know that my link profile has changed within the last month, yet both SEOMoz campaign link analysis and Open Site Explorer have shown the exact same number of links for more than a month. Majestic SEO fluctuates as I would expect, but Moz data is unchanged.
Moz Pro | | Aggie0 -
How to remove URLS from from crawl diagnostics blocked by robots.txt
I suddenly have a huge jump in the number of errors in crawl diagnostics and it all seems to be down to a load of URLs that should be blocked by robots.txt. These have never appeared before, how do I remove them or stop them appearing again?
Moz Pro | | SimonBond0 -
Crawl went from a few errors to thousands when I added Blog
I am new here. I recently got the errors from SEOmoz crawl on my site down to just a handful from a couple hundred. So I took the leap and moved my blog to www.mysitename.com/blog (which I see recommended here) and now my errors are in the thousands. My blog which was a separate url has pages back to 2007. I am not sure if it is appropriate to post my site url in a question here? One error that really stands out is this: Description <dd>Using rel=canonical suggests to search engines which URL should be seen as canonical.</dd> On my root page I have: rel="canonical" href="http://www.mysitename.com"/> Thanks for any help...
Moz Pro | | CMCD0 -
Crawl completed but no report available for download?
I put 2 crawls in the same day. One came back and delivered a report that I could download. The other one is completed (says so on the page) but there's no way for me to download the report. How do I get a hold of it? Thanks!
Moz Pro | | LiliArancibia0 -
Linkscape Update July 7 or 11 or both?
The Linkscape Update Calendar lists two updates in July (7th and 11th). http://apiwiki.seomoz.org/w/page/25141119/Linkscape-Schedule For some internal calendaring, it would be helpful to know which date the update will roll. And as always, thank you very much for the great tools.
Moz Pro | | Gyi0