Design a database with terabytes of data that supports efficient range queries ~ KodeKnight

Thursday, April 3, 2014

Design a database with terabytes of data that supports efficient range queries

April 03, 2014 CrackingTheCodingInterview, Database, kodeknight, limited memory, System Design And Memory Limits No comments

Problem

You have to design a database that can store terabytes of data. It should support efficient range queries. How would you do it?

Solution
This rings couple of points in the head. I believe the interviewer was expecting a data structure like hashtable ,B-tree or B+ trees. If we switch to distributed computing based solution, esp when "100 billion records" are involved, I would suggest you to look into Distributed Hash Table and map-reduce (for parallel query processing)

Points on B+ tree implementation
Construct an index for each field that requires range queries. Use a B+ tree to implement the index. A B+ tree organizes sorted data for efficient insertion, retrieval and removal of records. Each record is identified by a key (for this problem, it is the field value).

Since it is a dynamic, multilevel index, finding the beginning of the range depends only on the height of the tree, which is usually quite small. Record references are stored in the leaves, sorted by the key. Additional records can be found by following a next block reference. Records will be sequentially available until the key value reaches the maximum value specified in the query. Thus, run-times will be dominated by the number of elements in a range.

Avoid using trees that store data at interior nodes, as traversing the tree will be expensive since it won’t be resident in memory.

References

KodeKnight

Like Us

Followers

Topics

Books for You

Contributors

Search This Blog

Labels

Report Abuse

Blog Archive

Blogger news

Blogroll

ADs

Blogger news

Thursday, April 3, 2014

Design a database with terabytes of data that supports efficient range queries

0 comments:

Post a Comment

Labels

Likeus