Preamble
Many people who want to be free of license costs, high support costs, and technical restrictions on the Oracle side have taken up the popular sport of switching from Oracle to PostgreSQL. The same is true for those switching to PostgreSQL from MS SQL and other commercial platforms. But I think that by far the most common way is to switch from Oracle to PostgreSQL.
Many articles about moving from Oracle to PostgreSQL have been written; today, I’d like to share some of the lessons I’ve learned about actually creating a migration tool. We’ve experimented with a variety of methods over the years to develop the ideal tool, and we undoubtedly made some mistakes along the way that you won’t want to repeat. I’ve made the decision to share some of the personal realizations I had during this multi-year journey. I can’t give you a comprehensive account of them in a brief article, so I’ve instead chosen to focus on the strategies that will help you move data more quickly.
Transaction length matters
Transaction length is really irrelevant if you are only moving one or two gigabytes of data from Oracle to PostgreSQL. If we are discussing 10 or more TB of data, however, then things are different. Downtime is not an option for a lot of clients. You must support CDC (change data capture) in order to achieve almost zero downtime. An initial data snapshot will be taken, and changes will be made as PostgreSQL, the target system, catches up to the source database.
In practice, this means that a lot of data from Oracle needs to be copied while the source database is still being changed. You’ll soon encounter a well-known issue on the Oracle side:
ORA-01555 Error Message “Snapshot Too Old”
Many of you who have moved before will be painfully familiar with this problem. In an ideal situation, you should make sure that Oracle is set up correctly to support real transactions, not just short bursts of reading. It makes sense to warn the immigrant ahead of time to be on the lookout for these common show stoppers.
We also need to be aware of the following Oracle issue:
ORA-08177: can't serialize access for this transaction
Oracle’s support for high transaction isolation levels is not very good or right, but PostgreSQL’s is. You can’t just fire up a transaction in isolation SERIALIZABLE and hope for the best; that won’t work either. Reading the data also requires some forward thinking. Otherwise, Oracle’s limitations will become painfully obvious. The bottom line really is: Reading many, many TBs of data is not the same as reading a handful of rows. It makes a huge difference, and you have to prepare for that.
The Oracle catalog is SLOW
Users of PostgreSQL are a little spoiled. The system catalog can typically be queried very quickly. In actuality, 99.9% of PostgreSQL users have never even bothered to look at system catalog performance. When making a migration toolchain, the Oracle catalog needs to be queried and looked at carefully. You’ll immediately think, “Oh god, how slow can it be?” In what ways does that affect migrations? The terrible performance has the following effects:
- As much as possible, the GUI must refrain from reading from the Oracle catalog.
- Make sure you have a cached copy of the data on hand (either in PostgreSQL or memory).
If the performance of your graphical user interfaces depends on how well the Oracle system catalog works, the user experience is doomed. The application will no longer function. We discovered that this is particularly true if there are a lot of objects to move.
INVALID objects: Be prepared
Many items in PostgreSQL are stored in binary format, including views, which has a few benefits. Think about the following:
create table (aid int) test=#; create table test=# CREATE TABLE b (bid int); CREATE TABLE test=# CREATE VIEW v AS SELECT * FROM a, b; CREATE VIEW
I made two tables and a view that joins them. So far, everything has been fairly simple and is compatible with Oracle. What transpires, though, if we change “a” to “c”?
test=# ALTER TABLE a RENAME TO c; ALTER TABLE
The view’s name in PostgreSQL is just a label. Everything is a “number” in the background. Simply put, a view is stored as a binary representation rather than as a string. That has a couple of advantages:
test=# \d+ v View "public.v" Column | Type | Collation | Nullable | Default | Storage | Description --------+---------+-----------+----------+---------+---------+------------- aid | integer | | | | plain | bid | integer | | | | plain | View definition: SELECT c.aid, b.bid FROM c, b;
PostgreSQL will use the new name automatically, and the view is still functional. That is NOT the case in Oracle. You’ll have a view that is invalid in the end. Numerous types of objects in Oracle may actually be invalid, so your migration tool must always be ready for this. If you want to ensure a smooth transition, you will need to use additional logic to exclude and mark those. The reply process will fail repeatedly if you don’t because of all those outdated and invalid objects, putting your user experience at risk once more.
Null bytes and broken table content
In the past, we have moved countless times. Unusable data coming from Oracle is something we have seen far too frequently. Basically, null bytes and damaged data are the two most common errors. Null bytes simply have to be excluded. However, there is a catch: in various industries (including but not limited to finance), data changes must be documented so that regulators can track what happened with the information in transit. This basically means that, even if the data is obviously incorrect, we simply cannot ignore it and remain content.
Those rows must be photographed and recorded. It may also be necessary in some circumstances to develop transformation rules. Once more, the goal is to have rules that are revision-safe and actually explain what has happened to the data. Success and acceptance depend on this. Of course, we can’t keep migrating TBs of data only to discover that the data cannot be loaded time and time again. To make sure the tooling continues to be reasonably usable, you need to perform a “dry run” and a “find me all the broken data run.”
There isn’t much help from the way PostgreSQL handles COPY statements in error situations. You must find a way to ensure that the transaction loading the data ALWAYS commits, even if a large amount of data is loaded at once.
Efficient data loading in PostgreSQL
Every relational database, including Oracle and PostgreSQL, must write a WAL (write-ahead transaction log) while loading data. The repercussions are not fully understood by everyone. In reality, data must be written TWICE. It could mean that the increased I/O takes up to 40% of the total time needed to load data into PostgreSQL. See my post on reducing checkpoint distances for more details on lowering the WAL.
However, there’s more: Did you know that, typically, the first reading action following a write is also a write? Yes, you read that right. An easy SELECT can function as a write operation. What is the reason for this? because of clue bits. In other words, just because you’ve loaded the data doesn’t mean you’re finished writing to the disk. Once the migration is complete, there may still be TBs of I/O (particularly O) available. Immediate poor performance after switching to the new system may result from this.
That must be avoided at all costs. It’s crucial to load carefully. The use of “copy freeze,” as detailed in one of my writings on the subject, is one way to go about it. To design the best loading process possible, it also helps to have a general understanding of hint bits.
This shows that making changes to performance during a migration may require a certain level of skill, but they can lead to much better results. Often, the amount of I/O can be cut down by a lot. For example, when PostgreSQL replication is later turned on, we can speed up the loading process by a lot.
“It’s the datatype, – stupid!”
The set of data types used in the Oracle world is very different from what we have on the PostgreSQL side. Thus, data type mapping is significant for the following reasons: It is a matter of efficiency first and foremost. Not every type of data is created equally. For example, the integer data type is much more effective than numeric, and boolean is going to be much smaller than integer. The second consideration is what you anticipate on the target side. The following must be considered when creating tooling:
A migration is a chance to do cleanup
This is of utmost importance. Don’t pass up the chance to use more appropriate and efficient data types to fix your data structure, clear out unnecessary items, or clean up legacy messes.
Finally …
There is a lot to be said about how specific features should be migrated and the differences between Oracle and PostgreSQL. Instead of approaching the subject from the viewpoint of someone who is actually performing the migration, I wanted to shed some light on it this time from the perspective of a toolmaker.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector
- 19 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Transforming Database Performance and Observability in the Real Estate Sector
- 18 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Database Performance for EdTech Firms: How Enteros Enhances RevOps with Advanced Database Performance Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…