It seems like a lot of code to generate the tables in the first place and you STILL need to read the output scripts just to ensure the ORM isn't generating some garbage you didn't want.
That seems like a lot of extra effort when a simple migration service (such as liquibase) could do the same work running SQL directly. No question on "which indexes are getting created and why". No deep knowledge of Django interactions with sql. Instead, it's just directly running the SQL you want to run.
Liquibase starts with "Write your database change code in your preferred authoring tool in SQL, YAML, JSON, or XML." So instead of just having my ORM generate that and I just have to read them to ensure correctness, I have to manually write change scripts instead? I don't see how that's is comparable.
Liquibase could certainly come in after I have some SQL scripts generated from my ORM and do whatever it does.
I don’t want to go without an ORM because I’ll end up making one ad-hoc anyway. I’m not going to do work on raw tuples in my application code.
I go the liquibase route for migrations, and just use the mapping portion of any ORM.
"Wow, 1-2 command and my app and database are in sync!"
In reality, migration tools are 100% about data loss prevention.
If you do not care about data loss, updating your schema is trivial, just drop everything and create. Dev environments should be stateless anyways, using separate data "fixtures" when needed.
Data loss itself is a highly nuanced topic. Some data is replaceable, some might be protected in a separate store. So I agree that ORMs should challenge the assumption that automatic migration tools need to be part of their kitchen sink.
My assumption is that people have used orms that automatically add the index for you when you create a relationship so they just conflate them all. Often they’ll say that a foreign key is needed to improve the performance and when you dig into it, their mental model is all wrong. The sense they have is that the other table gets some sort of relationship array structure to make lookups fast.
It’s an interesting phenomenon of the abstraction.
Don’t get me wrong, I love sqlalchemy and alembic but probably because I understand what’s happening underneath so I know the right way to hold it so things are efficient and migrations are safe.
Depending upon the database storage engine, available memory, and table size I could see there being _some_ performance hit if only PKs are used for statistics but I'd think that modern RDBMSes are smart enough to cache appropriately. Am I missing something?
FWIW, I've also asked everyone I've interviewed in the past decade about indexes and FKs. Most folks I've talked to seem to understand FKs. They're often fuzzier on the details of indexes, but I don't recall anyone conflating the two.
Is it? In Postgres, all FK references must be to a column with a PK or unique constraint or part of another index. Additionally, Postgres and Maria (maybe all SQL?) automatically create indexes for PKs and unique constraints. There's a high likelihood that a foreign key is already indexed _in the other table_.
Generally, I agree with your statement. Adding a FK won't magically improve performance or create useful indices. But, the presence of a FK or refactoring to support a FK does (tangentially) point back to that index.
Django automatically creates an index on the referencing table to ensure that joins are fast. The fact that you have the relationship in the ORM means that’s how you’re likely to access the data so it makes perfect sense.
The mental model mismatch I’ve seen is that people appear to think of the relationship as being on the parent object “pointing” at the child table.
- The kind of person to dive into the schema and worry about an unnecessary index
- Smart enough to heed Django's warnings and use `Meta.UniqueConstraint`
- Dumb enough to ignore Django's warnings and not use `Meta.Indexes`
I think it's funny that the kind of dev that 100% relies on the ORM and would benefit from this warning would probably never find themselves in this gritty optimization situation in the first place.
That being said, I enjoyed the article and learned something so maybe I'm the target audience and not them.
This is nice to know if you're using Django, but as important to note is that neither Postgres nor SQLAlchemy / Alembic will do this automatically.
For example, does the FK need to be the first field in a unique together?