1. HDFS is not at par with other distributed filesystems. Real distributed filesystems don’t require data to be explicitly copied in or out, but allow it to be modified in place – efficiently, even at single byte granularity. Real distributed filesystems are not allowed to report writes as complete when they’ve only been buffered at the client. I could go on, but there’d be little point. HDFS does one thing, and one could argue that it does that one thing very well, but it’s not even in the same category as a true general-purpose filesystem.

    Disclaimer: I’m a developer for GlusterFS, which is a true distributed filesystem even if it’s not a perfect one.


Please enter your comment!
Please enter your name here