010. SQL Models defined in SQLAlchemy/Alembic
Status: proposed
Authors: Josh Holland
Date: 2025-06-09
Context and Problem Statement
As we build more components making use of a PostgreSQL database, where can we store schema definitions so that they are reusable across projects? What’s the best client library to query the database?
Decision Drivers
- Standard tools where possible
- Python compatibility
- Avoid manually running
ALTER TABLE
etc commands by hand - Avoid lock-in: the solution should not be too opinionated so that we can use other languages/tools to access the database
Considered Options
- SQLAlchemy declarative models in centralised repo + Alembic
- SQLAlchemy models defined as part of each application
- SQLAlchemy core with hand-written marshalling to/from model classes in each application + Alembic
- SQLModel from Tiangelo
Decision Outcome
Chosen option: SQLAlchemy declarative models with Alembic in a centralised repo because it is a relatively light-weight solution that still provides many features. There do not appear to be any realistic alternatives to Alembic for managing migrations.
Positive Consequences
- One repo holds the full source of truth for all the database models and configuration
- Database migration is handled by code which can be version-controlled
Negative Consequences
- One more repo to keep track of
- Need to keep central models repo and downstream application repos in sync
Pros and Cons of the Options
SQLAlchemy declarative models in central repo
Define the models in a single repo, then import them into client applications.
# central repo
from datetime import datetime
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class PhenocamTag(Base):
__tablename__ = "phenocam_tags"
id: Mapped[int] = mapped_column(primary_key=True)
last_updated: Mapped[datetime]
image: Mapped[str]
site_id: Mapped[str]
activity: Mapped[str]
new_land_cover: Mapped[str]
# consuming application
from sqlalchemy import select
from dri_database_models import PhenocamTag
def foo(site_id):
with Session(engine) as session:
query = select(PhenocamTag).where(PhenocamTag.site_id == site_id).order_by(PhenocamTag.last_updated)
tags = session.execute(query).all()
- Good, because models which might be used by e.g. both ingestion and processing code can be defined independently of them
- Good, because it’s a natural home for Alembic migrations to manage the database schema
- Good, because we can use the powerful and widely used SQLAlchemy query API
- Bad, because client applications using a version of the models older than the database could lead to issues
- Bad, because it’s another repo to manage
SQLAlchemy declarative models in individual repos
Each subcomponent which interfaces with the database defines its own declarative models.
- Good, because there is no possibility of the application code and database schema getting out of sync
- Good, because we can still use the SQLAlchemy API
- Bad, because it gets complicated when multiple components need to access the same database tables
- Bad, because no one repo “owns” the code so there is no natural place to put migration information
SQLAlchemy core with custom marshalling
Models are defined independently of SQLAlchemy and translated into each component’s classes.
# consuming application
from datetime import datetime
from sqlalchemy import select, Table, Column, String, DateTime, MetaData
metadata_obj = MetaData()
class PhenocamTag:
last_updated: datetime
image: str
site_id: str
activity: str
new_land_cover: str
def foo(site_id):
tag_table = Table(
"phenocam_tags",
metadata_obj,
Column("last_updated", DateTime),
Column("image", String),
# etc
)
with engine.connect() as conn:
query = select(tag_table).where(tag_table.site_id == site_id).order_by(tag_table.last_updated)
tags = [PhenocamTag(...) for ... insession.execute(query).all()]
- Good, because there is no dependency on an external repo for each component
- Bad, because there’s a lot of boilerplate and duplication between the application-specific model and the SQLAlchemy metadata required for the querying code
- Bad, because there is nowhere to put migrations
SQLModel
Use Tiangelo’s SQLModel library which wraps SQLAlchemy and integrates it with Pydantic.
- Good, because it is by the same authors as FastAPI and therefore integrates well
- Bad, because we’d still have to make the same central repo vs distributing ownership across clients
- Bad, because it’s still pre-1.0 and much less mature than SQLAlchemy