Preamble
End users can store BLOBs (binary large objects) in the database using PostgreSQL. These functions are frequently used by users to manage data directly in SQL. Whether or not this is a good thing has been the subject of lengthy debate. We anticipate that particular conversation to last forever. If you fall into the “pro BLOB” camp, however, we want to offer some tips on how PostgreSQL can handle binary data with the greatest amount of effectiveness.
Loading files into the database
BLOBs are essential; to show how they work, I have created a simple file that we can import into the database later:
iMac:~ hs$ echo abcdefgh > /tmp/file.txt
Calling the lo_import function with the name of the file you want to load (as the superuser) will import this file. The process is as follows:
test=# SELECT lo_import('/tmp/file.txt'); lo_import ----------- 98425 (1 row)
As you can see, PostgreSQL provides us with a number (= object ID). If you want to keep track of those values, you must store them somewhere because the filename has “vanished.” Making a simple table is the way to do this:
test=# CREATE TABLE t_file (name text, oid_number oid); CREATE TABLE test=# INSERT INTO t_file VALUES ('/tmp/file.txt', lo_import('/tmp/file.txt')) RETURNING *; name | oid_number ---------------+------------ /tmp/file.txt | 98432 (1 row)
You have the choice of storing object IDs using the OID data type in PostgreSQL. In case the path is required later, storing the filename is simple and can be done in a text or varchar column. In general, the large object has nothing to do with the file in the filesystem and is entirely independent of it. Therefore, saving the filename serves only as a reminder of what we imported.
However, don’t worry about a single file. We’ll see what happens if we import it a million times:
test=# INSERT INTO t_file SELECT '/tmp/file.txt', lo_import('/tmp/file.txt') FROM generate_series(1, 1000000); INSERT 0 1000000 test=# TABLE t_file LIMIT 10; name | oid_number ---------------+------------ /tmp/file.txt | 98432 /tmp/file.txt | 98433 /tmp/file.txt | 98434 /tmp/file.txt | 98435 /tmp/file.txt | 98436 /tmp/file.txt | 98437 /tmp/file.txt | 98438 /tmp/file.txt | 98439 /tmp/file.txt | 98440 /tmp/file.txt | 98441 (10 rows)
The file was imported flawlessly. As can be seen in the preceding listing, every file has a unique object ID.
Behind the scenes: How PostgreSQL stores BLOBs
We can examine the internals to determine how data is stored internally now that you have seen how to import data. PostgreSQL takes care of things by adding the information to pg_largeobject:
test=# \d pg_largeobject Table "pg_catalog.pg_largeobject" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- loid | oid | | not null | pageno | integer | | not null | data | bytea | | not null | Indexes: "pg_largeobject_loid_pn_index" UNIQUE, btree (loid, pageno)
The object ID that we have kept in our table is contained in the “loid.” In a column labeled “bytea” (byte array), the actual data is kept. PostgreSQL divides up large files into smaller ones because some of them might be quite large.
Fields in byte arrays are simple to display. However, in its default configuration, it is not really human readable (which is pointless because we are dealing with binary data). In my example, I’ve imported some text to make the “escape” format a little easier to read:
test=# SET bytea_output TO escape; SET test=# SELECT * FROM pg_largeobject WHERE loid = 98437 ORDER BY pageno; loid | pageno | data -------+--------+-------------- 98437 | 0 | abcdefgh\012 (1 row)
The “bytea” data type is actually a wrapper around the BLOB interface, as can be seen in this example. The real magic lies in that.
Removing BLOBs
Let’s continue by deleting BLOBs once more. Calling the “lo_unlink” function is the appropriate action to take. The entry in the system catalog will be deleted as a result.
test=# SELECT lo_unlink(98432); lo_unlink ----------- 1 (1 row)
It will be simple to remove a simple entry. Let’s attempt to respond to the following query, though: What happens when a single transaction removes millions of BLOBs from the system?
Removing millions of BLOBs at a time
Executing the deletion is simple. On the other hand, the outcome is somewhat unexpected:
test=# BEGIN; BEGIN test=*# SELECT lo_unlink(oid_number) FROM t_file ; ERROR: out of shared memory HINT: You might need to increase max_locks_per_transaction. test=!# ROLLBACK; ROLLBACK
We are out of memory, so PostgreSQL cannot execute the transaction! How is that possible? A large object needs to be locked before being deleted. The issue is that shared memory can only support a certain number of locks.
Please take note that row locks are NOT kept in shared memory. Only objects, such as table locks, are true of this.
As a result, our memory is getting low. How many locks can we store is now the main concern. Three configuration variables (found in postgresql.conf) contain the solution:
test=# SHOW max_locks_per_transaction; max_locks_per_transaction --------------------------- 64 (1 row) test=# SHOW max_connections; max_connections ----------------- 100 (1 row) test=# SHOW max_prepared_transactions; max_prepared_transactions --------------------------- 0 (1 row)
There are: locks at my disposal.
number_of_locks = max_locks_per_transaction * (max_connections + max_prepared_transactions).
In my case, the answer is 6.400. In other words, we cannot process these deletions in a single transaction.
There are essentially two approaches to this issue:
- A restart is necessary when changing the PostgreSQL configuration.
- In chunks, remove BLOBs
These are the only methods we have to remove the BLOBs from the database.
Dealing with orphan BLOBs
Occasionally, BLOBs are lost but the object itself is not unlinked because the reference to the OID is lost. A command-line tool is available in this situation:
iMac: hs$ vacuumlo --help vacuumlo removes unreferenced large objects from databases. Usage: vacuumlo [OPTION]... DBNAME... Options: -l, --limit=LIMIT commit after removing each of the LIMIT large objects -n, --dry-run, don't remove large objects, just show what would be done. -v, --verbose, write a lot of progress messages -V, --version output version information, then exit -?, --help show this help, then exit Connection options: -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port -U, --username=USERNAME user name to connect as -w, --no-password never prompt for password -W, --password force password prompt
Vacuumlo will search for and remove any large objects that are not referenced. In the event that your application is glitchy or something unfavorable occurs, this is a good way to clean a database.
Finally …
We want to let you know that in case you are having trouble with BLOBs, we also provide PostgreSQL consulting and round-the-clock database support. You can undoubtedly get assistance from our highly competent team with your BLOB-related issues.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…