How Databases Actually Store Data (And Why You Should Care)

How Databases Actually Store Data (And Why You Should Care)

I am studying for my EECS 484 DBMS final and decided to try to get some extra value out of studying by making this blog post. I will explain every concept in my own words that we have learned since the midterm. This includes the database internals portion of the course. I will be as helpful as possible for those trying to learn for themselves instead of enduring the pain of taking the full class. Physical Latency First, we will establish the ground rules of physics. A DBMS assumes that the database lives on the non-volatile disk (SSD/HDD), but has to move this data to volatile memory (DRAM) before it can be put to use. The job of the DBMS is to optimize this data transfer process. As you get further away from the cpu (cpu -> cache -> DRAM -> disk) the speed to read that data skyrockets. Because random access on disk is insanely slow, the DBMS tries to optimize for sequential access of data. This means reading and writing in sequential blocks rather than jumping around. Who manages memory? Some of you might be asking: Why doesn’t the database just use mmap and let the Operating system manage memory? The OS turns out to be a terrible option: Transactional Safety: The DBMS needs to flush dirty pages (pages with unwritten changes) to disk in a specific order in order for the data not to be lost during a crash. The OS doesn’t care about transactional logs. It…

Continue reading →

 

Want more insights? Join Grow With Caliber - our career elevating newsletter and get our take on the future of work delivered weekly.