News & Blog back

Subscribe

No UPDATE updates

Working with databases one can’t help but wonder what happens with data stored on the disk.
For example you need to know this to be able to save at least some of the information on damaged data blocks with help of hexedit.
Today we’ll try to find out how data types such as integer are kept on disk. To do that we will create a testing table, review its contents using hexedit and change data on disk. For the sake of this exercise I will be using PostgreSQL 14.

Let’s create a testing table:

create table t_int (n int);
insert into t_int values (351);
insert into t_int values (1000);
insert into t_int values (-351);

then extract the data:

select * from t_int;
n
------
351
1000
-351

That’s right, in this very order we insert the data in the first place.
Let’s find out where our table is stored:

SELECT pg_relation_filepath('t_int');
pg_relation_filepath
----------------------
base/14486/32905
show data_directory;
data_directory
------------------------
/var/lib/pgsql/14/data

Now let’s check what our file contains:

hexedit /var/lib/pgsql/14/data/base/14486/32905


Why does this happen? The problem is that dirty data pages haven’t been removed to the disk.

Let’s run checkpoint command:

postgres=# checkpoint;
CHECKPOINT

And once again check the contents of the file. This time there appears to be data in there:

Let’s find out what exactly is kept there. To do that I’ll split what we see into following sections:

A7 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  03 00 01 00  00 08 18 00  A1 FE FF FF  00 00 00 00
A6 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  02 00 01 00  00 08 18 00  E8 03 00 00  00 00 00 00
A5 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  01 00 01 00  00 08 18 00  5F 01 00 00  00 00 00 00

We see that there are repeated elements and a high amount of zeros. For easier read let’s remove them:

A7 03 00 00  03 00 01 00  00 08 18 00  A1 FE FF FF  00 00 00 00
A6 03 00 00  02 00 01 00  00 08 18 00  E8 03 00 00  00 00 00 00
A5 03 00 00  01 00 01 00  00 08 18 00  5F 01 00 00  00 00 00 00

If we take the last row and read it from the end here is what we see:

01 5F (hex) = 351 (dec)

We found where the first value is!
I wonder, what is under it?

03 E8 (hex) = 1000 (dec)

Cool! Now, once we found out the secret of type integer on the disk, let’s review the last value:

FF FF FE A1 = 4 294 966 945

What?! But there had to be -351?!!

Digging deeper.

From the documentation it becomes clear that int values are somewhere in the range of -2147483648 and +2147483647, so including 0 we get 4 294 967 296 values.

How much do you think 4 294 966 945 – 4 294 967 296?

-351

Bingo!
An alternative way to recognise the written value is the performing common operation for presentation of negative MSB (most significant bit).
We need to start with a positive number, write it in a binary system, invert and add 1.

4 294 966 945 = 11111111111111111111111010100001

Inverting and adding 1:

101011110+1 = 1 0101 1111 = 351 (dec)

.
We now understand how data is stored on the disk. Now let’s try, using this knowledge to change the value (of course you shouldn’t do this in a “live” database as you will get a database with corrupted data).

For example, let’s replace 351 with 888
We know that 351 is 01 5F, we replace this value using hexedit, to 888 (dec) = 78 03 (hex)
This way we will get following string (I removed the zeros)

A5 03 00 00 … 01 00 01 00 00 08 18 00 78 03 00 00 00 00 00 00

Let’s extract data from the table:

select * from t_int;
n
------
351
1000
-351

Why does this happen? To get the answer to this question we need EXPLAIN (analyze, buffers):

explain (analyze, buffers) select * from t_int;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on t_int (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.007 rows=3 loops=1)
Buffers: shared hit=1
Planning Time: 0.029 ms
Execution Time: 0.017 ms

As you can see, PostgreSQL reads one block from cache. Data in RAM is not aware that we changed data on disk.
Let’s restart Postgres and clear cache:

#!/bin/sh
# restart PostgreSQL
systemctl stop postgresql-14
systemctl start postgresql-14
sync; echo 3 > /proc/sys/vm/drop_caches

Now we can query our table again:

select * from t_int;
n
------
888
1000
-351

So we changed data on disk, queried the table, which returned outdated data. Seeing “Buffers: shared hit=1” gave us a hint that while performing the query PostgreSQL wasn’t reading data from disk, but rather from cache, which was located in RAM. To clear the data from cache I restarted Postgres and cleared the cache. Following this procedure I queried the data once again – the query returned updated data (since this time the cache was clear).

Hope this exercise was helpful and provided some insight on how the data is stored on the disk and how to go about investigating this storage.
Have you tried exploring this yourself? What have you learned? I’d be happy to hear from you in the comments below!

You may also like: