Monthly Archives: July 2021

  • 0

Do you think file format does matter in big Data technology?

Category : Bigdata

Yes, Thats matter a lot because of following main reasons:

By using correct file format as per your use case you can achieve following.

1. Less storage:
if we select a proper file format with good compatibile compression technique then it’s required less storage.

2. Faster processing of data:
based on our use case if we select correct file format( like row or column based file format) we can achieve high performance while processing the data.

3. Reduce disk I/O cost:
if processing is efficient with best compression method then I/O cost also be optimized.

Also there is multiple factor which we can think of while selecting file format for our use case.
• file is splittable or not
• schema evaluation support
• Predicate Pushdown / Filter Pushdown
• compression technique
• row based or column based
• support for serialization/deserialization
• support for metadata
• whether file format is supported by source and target system
• support for column types
• Ingestion, latency