Category : Bigdata
Yes, Thats matter a lot because of following main reasons:
By using correct file format as per your use case you can achieve following.
1. Less storage:
if we select a proper file format with good compatibile compression technique then it’s required less storage.
2. Faster processing of data:
based on our use case if we select correct file format( like row or column based file format) we can achieve high performance while processing the data.
3. Reduce disk I/O cost:
if processing is efficient with best compression method then I/O cost also be optimized.
Also there is multiple factor which we can think of while selecting file format for our use case.
• file is splittable or not
• schema evaluation support
• Predicate Pushdown / Filter Pushdown
• compression technique
• row based or column based
• support for serialization/deserialization
• support for metadata
• whether file format is supported by source and target system
• support for column types
• Ingestion, latency