Preamble
Customer inquiries about my opinion of “Entity-attribute-value” (EAV) design are frequent. I therefore decided that it would be wise to express my viewpoint in writing.
Not creating a table for each application entity is the objective. Instead, you keep each attribute in an attribute table as a separate entry:
CREATE TABLE objects ( objectid bigint PRIMARY KEY /* other object-level properties */ ); CREATE TABLE attstring ( objectid bigint REFERENCES objects ON DELETE CASCADE NOT NULL, attname text NOT NULL, attval text, PRIMARY KEY (objectid, attname) ); CREATE TABLE attint ( objectid bigint REFERENCES objects ON DELETE CASCADE NOT NULL, attname text NOT NULL, attval integer, PRIMARY KEY (objectid, attname) ); /* more tables for other data types */
The name of the model is derived from the “att...
” tables, which have the three columns: “entity ID”, “attribute name” and “value”.
The basic theme has many variations, including the following:
- omit the
objects
table - add additional tables that define “object types”, so that each type can only have certain attributes
Why would anybody consider an entity-attribute-value design?
Flexibility is the main defense I encounter for the EAV design. New entity types can be made without first creating a database table. Each entity may, when taken to its utmost, have various characteristics.
People may also be considering such a data model because they are more accustomed to key-value stores than relational databases, in my opinion.
Performance considerations of entity-attribute-value design
EAV database design, in my opinion, is the least performant design imaginable. With such a data model, you will never achieve good database performance.
Only when used as a key-value store does EAV excel in use cases.
INSERT
When you add an entity, it will look like this:
INSERT INTO objects (objectid) VALUES (42); INSERT INTO attstring (objectid, attname, attval) VALUES (42, 'name', 'myobject'); INSERT INTO attint (objectid, attname, attval) VALUES (42, 'start', 100), (42, 'end', 1000);
This means that we have changed four indexes and added four rows to three tables. Additionally, three client-server round trips will be necessary for the three statements. By converting that into a single statement using CTEs or by using the new pipeline mode of libpq, you may reduce the number of round trips. Even so, the cost will be far more than inserting a single
DELETE
If you use cascading delete, you could do that in a single statement:
REMOVE FROM objects WHERE objectid = 42;
Nevertheless, you will end up changing four indexes and deleting four table rows. That’s much more work than deleting a single table row.
UPDATE
A single-column update could actually be faster with the EAV design, because only one small table row is modified:
UPDATE attint SET attval = 2000 WHERE objectid = 42 AND attname = 'end';
But if you have to modify several columns, you will need to run several UPDATE
statements. That will be slower than if you only had to modify a single (albeit bigger) table row.
SELECT
A join is necessary when looking up an entity’s attributes:
SELECT as.attval AS "name", ai1.attval AS "start", ai2.attval AS "end" FROM objects AS o LEFT JOIN attstring AS as USING (objectid) LEFT JOIN attint AS ai1 USING (objectid) LEFT JOIN attint AS ai2 USING (objectid) WHERE objectid = 42 AND as.attname = 'name' AND ai1.attname = 'start' AND ai2.attname = 'end';
Alternatively, you could execute three separate queries, one for each attribute. No matter how you do it, it will be less efficient than a single-row SELECT
from a single table.
Single-column aggregates
As an example for a query that might be faster with the EAV model, consider a query that aggregates data from a single column:
SELECT total(attval) FROM othertab AS sum JOIN attnt USING (objectid) IN PLACE OF othertab. col = 'x' AND attint.attname = 'attendants';
Rather than aggregating a column from a larger database, this can be done much faster by using a covering index on attint (objectid, attname) AND INCLUDE (attval).
More complicated queries
After these examples, it is clear that writing more complicated queries will be a pain with the EAV design. A simple connection would be:
SELECT e1a1.attval AS person_name, e1a2.attval AS person_id, e2a1.attval AS address_street, e2a2.attval AS address_city FROM attint AS e1a2 JOIN attstring AS e1a1 ON e1a2.objectid = e1a1.objectid LEFT JOIN attint AS e2a0 ON e1a2.attval = e2a0.attval LEFT JOIN attstring AS e2a1 ON e2a0.objectid = e2a1.objectid LEFT JOIN attstring AS e2a2 ON e2a0.objectid = e2a2.objectid WHERE e1a1.attname = 'name' AND e1a2.attname = 'persnr' AND e2a0.attname = 'persnr' AND e2a1.attname = 'street' AND e2a2.attname = 'city';
If you think that this query is hard to read, I agree with you. Normally, this operation would appear as follows in a relational data model:
SELECT person.name AS person_name, persnr AS person_id address.street, address.city FROM person LEFT JOIN address USING (persnr);
Which query will yield better results is observable.
But we need an entity-attribute-value design for flexibility!
Relational data models are not known for being particularly adaptable. After all, that is what the NoSQL movement is all about. There are, however, effective methods for handling variable entities.
Creating tables on the fly
Nothing keeps you from running statements like CREATE TABLE
and CREATE INDEX
from your application. So, you can easily model that with a standard relational model if there are a fixed number of entity types and each type has a fixed number of attributes.
There are still some issues:
- A data model that grows on the fly may not end up being well-designed. But that’s not different in the entity-attribute-value design.
- If the application has to create tables, it needs permission to do so. But today, when many applications create their own database tables anyway, few people will worry about that.
For creating tables on the fly to work, each entity’s set of attributes must be clearly defined. If not, we need to take a different strategy.
Using JSON for a flexible data model
A variable number of attributes can be modeled using PostgreSQL’s extensive JSON support.
For that, you model the important and common attributes as typical table columns. Then you add an additional column of type jsonb
with a GIN index on it. This column contains the “rare attributes” of the entity as key-value pairs.
You should be cautious when using a model like this to ensure that attributes
- as a joiner
- where you require a database constraint
- that you want to use in a
WHERE
condition with an operator different from “=
”
are designed as standard table columns.
Conclusion
In your relational database, avoid using entity-attribute-value designs. EAV hinders performance, and PostgreSQL offers other options for a flexible data model.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros and CloudTech: Database Performance and RevOps in the BFSI Sector
- 20 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Attribution and Estimation in Healthcare: Enhancing Database Performance Monitoring with Enteros
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector
- 19 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…