As mentioned in my last programming post, I'm in the initial stages of building a program designed to help authors (and others with similar needs, maybe Game Masters, etc) organize world building information. I've decided to use PostgreSQL to support the data end of things, using the
psycopg2 module to interact via Python scripts. The next significant decision I find myself facing: how should I store images? There doesn't seem to be a quick, one size fits all answer to this question, so let's explore the options.
Store a reference in the Database, keep the image in the filesystem
There are a couple advantages of this method, which seems to have been the most commonly recommended strategy in the early 2010s. The strongest recommendations for doing it this way revolve around the fact that databases at the time had difficulty storing larger binary objects, or maybe more accurately, difficulty retrieving them in an efficient manner. This also saves on disk space by avoiding double storage - you don't have both the original file, and the database, using storage. Also, the file is still visible in a human-readable format without loading up your database access program.
However, database technology has improved since this was the strongly recommended option. Efficiency doesn't seem to be as significant of an issue with larger files now, and if you store your images separately, they are vulnerable to getting lost or accidentally deleted, or otherwise separated from the database itself. And, if my program is set up to allow remote storage, ie, on a server hosted instance of PostgreSQL, storing images on the filesystem would require either: file write access to the remote server, and a way to save a config for the remote path; or: keeping images in the local filesystem, where they wouldn't be available if the user decided to install on multiple machines accessing the same database.
I seriously considered this option, but ultimately the cons outweigh the pros; I prefer to keep the data together for easy access, retrieval, backups, and so on.
Store the image file in the Database directly
If I'm going to store the image directly, there are still multiple options to sort through. What are the pros and cons of each? What type of table or column do I need to create to work with each?
This has the benefit of being transparent to the end user. PostgreSQL will, by default, break up data entry larger than the fixed page size (typically 8KB) and store it in a TOAST table automatically associated with the directly created table. Undoubtedly any of my
TEXT fields which receive enough user data will use this feature regardless. It is limited to a max size of 1GB per cell, but for ordinary image storage I can't foresee that being a problem. Going this route would mean storing image data in a
Large Object using BLOB/CLOB
The other alternative way to store images in PostgreSQL (or other large objects) is using
BLOB (binary large object) or
CLOB (character large object) types. A large object in current versions of PostgreSQL can reach up to 4TB (older versions were limited to 2TB). If you have really massive pieces of information, or if you require streaming access, this is the way to go. However, the downside is it's more complicated to deal with, the objects are actually stored in a separate system table, and are interacted with separately from their parent table. To handle it correctly you need to set up additional triggers and jump through extra hoops for each transaction.
There's a lot more that could be said about the details and specific ideal use cases for each of these storage solutions, but the points above are what I find relevant to my particular considerations. I'll be using a
BYTEA column and allowing PostgreSQL to TOAST the data automatically as needed, because I don't anticipate the need for files larger than 1GB, and don't see a reason to overcomplicate the transaction. Also, it seems ideal for the possible cloud-storage use case to keep the files stored in the database itself, rather than externally.
Next step? Learn what syntax to use to get
psycopg2 to handle the interactions.