Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.1.0 (2026-06-04)
- PySpark is no longer a core dependency; install
rowsmyth[spark]or thesparkuv dependency group when you need rowsmyth to pull inpyspark(e.g. local development). On Databricks and other managed Spark, installrowsmythalone and use the cluster PySpark
1.0.0 (2026-06-03)
BREAKING CHANGES
- Complete API overhaul - all
0.xpublic symbols removed - Dropped SQLAlchemy / factory-boy dependency; library now targets PySpark exclusively
FactoryBuilder,Dataset.create()and@variantdecorator (old form) all removedModel.factory(n)signature changed; see newFactory.count()fluent API- Direct
Modelsubclassing replaced by explicit scoped bases fromdeclarative_base() generate()removed in favour ofBase.dataset(spark, seed=None)context.pyremoved; dataset-session internals now live indataset.py- Domain failures now raise rowsmyth-specific errors rooted at
RowsmythError; catch the named rowsmyth errors instead of builtinValueError,TypeError,KeyErrororRuntimeError
Feat
declarative_base()- creates a scoped rowsmyth base with an independent model registry; concrete subclasses declare__table_name__,__definition__(PySparkStructType) and__primary_key__; implementgenerator(ctx)to produce one rowBase.dataset(spark, seed=None)context manager - activates a dataset session bound to one declarative base; all factories must run inside this blockDatasetsession object - exposesspark,base,registry,faker,random,seed,dataframes; providesnext_seq(name),pool(view, col)anddataframe(name); auto-registers every committed table as a Spark temp viewFactoryfluent builder -Model.factory()returns aFactory; chain.count(n),.where(**cols),.has(child, via=None),.variant(name), then call.create()to generate rows and receive root model instancesModel.create(**cols)- create a single row imperatively inside an active dataset, with optional column overridesRowCtxper-row context passed togenerator()- exposesfaker,random,spark,index,row; providessequence(name),pool(view, col)andparent(table, role=None)for FK resolutionWrongDeclarativeBaseError- raised when a model from one declarative base is created inside a dataset for another base- Custom error hierarchy -
RowsmythErrorroots named domain failures such asDatasetContextError,UnknownColumnError,EmptyPoolError,CompoundPrimaryKeyErrorandUnknownVariantError Pool- wraps a Spark temp view column;.choice()returns a deferredPoolChoiceresolved deterministically in Spark;.sample(k)picks distinct values without replacement@variantdecorator - marks aModelmethod as a named partial override; apply withFactory.variant(name)- Unity Catalog support -
__catalog__,__schema__,__table_tags__,__comment__;Model.fqn(),Model.uc_tag_sql(),Model.column_tags(),Model.column_comments() Model.__expectations__- nameddict[str, str]of check expressions for data quality frameworks
0.1.0 (2026-05-25)
Feat
declarative_base()with rowsmyth capabilities mixed in; acceptsmetadata,type_annotation_mapandregistryarguments matchingDeclarativeBasegenerators()classmethod for co-located factory-boy declarations; keys are column attributes or strings, values are any factory-boy declaration@variantdecorator for named model variants; returned dicts override specific generators when applied via.mix()Model.factory(n)/Model.factory(min, max)for hierarchical data generation with exact or random-range countsFactoryBuilder.has(*builders, via=None)to attach child builders; foreign keys resolved automatically from SQLAlchemy relationship metadata; usevia=to disambiguate multiple relationships to the same parentFactoryBuilder.mix(**proportions)to distribute generated instances across named variants using proportions that sum to <= 1.0; the remainder receives no variantFactoryBuilder.where(overrides)to force fixed column values on every generated instance; takes precedence over variants and generatorsFactoryBuilder.random_seed(value)to seedrandomandFakerfor reproducible outputFactoryBuilder.create()to generate and persist instances to an in-memory SQLite database; returns a list of root model instancesBase.dataset(*builders)for flat multi-table generation; rows are created in FK dependency order with foreign keys sampled randomly from the created poolDataset.random_seed(value)to seedrandomandFakerfor reproducible outputDataset.create()to generate and persist instances; returnsdict[str, list[Model]]keyed by__tablename__Model.__comment__classproperty for table-level comment from__table_args__Model.__table_info__classproperty for table-levelinfodict from__table_args__Model.__column_info__classproperty for per-columninfodicts, keyed by column nameModel.__expectations__classproperty for namedCheckConstraintexpressions keyed by constraint name; maps directly to data quality frameworksModel.__spark_schema__classproperty to convert the model to a PySparkStructTypepreserving nullability and column metadata; requiresrowsmyth[spark]