Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

1.1.0 (2026-06-04)

  • PySpark is no longer a core dependency; install rowsmyth[spark] or the spark uv dependency group when you need rowsmyth to pull in pyspark (e.g. local development). On Databricks and other managed Spark, install rowsmyth alone and use the cluster PySpark

1.0.0 (2026-06-03)

BREAKING CHANGES

  • Complete API overhaul - all 0.x public symbols removed
  • Dropped SQLAlchemy / factory-boy dependency; library now targets PySpark exclusively
  • FactoryBuilder, Dataset.create() and @variant decorator (old form) all removed
  • Model.factory(n) signature changed; see new Factory.count() fluent API
  • Direct Model subclassing replaced by explicit scoped bases from declarative_base()
  • generate() removed in favour of Base.dataset(spark, seed=None)
  • context.py removed; dataset-session internals now live in dataset.py
  • Domain failures now raise rowsmyth-specific errors rooted at RowsmythError; catch the named rowsmyth errors instead of builtin ValueError, TypeError, KeyError or RuntimeError

Feat

  • declarative_base() - creates a scoped rowsmyth base with an independent model registry; concrete subclasses declare __table_name__, __definition__ (PySpark StructType) and __primary_key__; implement generator(ctx) to produce one row
  • Base.dataset(spark, seed=None) context manager - activates a dataset session bound to one declarative base; all factories must run inside this block
  • Dataset session object - exposes spark, base, registry, faker, random, seed, dataframes; provides next_seq(name), pool(view, col) and dataframe(name); auto-registers every committed table as a Spark temp view
  • Factory fluent builder - Model.factory() returns a Factory; chain .count(n), .where(**cols), .has(child, via=None), .variant(name), then call .create() to generate rows and receive root model instances
  • Model.create(**cols) - create a single row imperatively inside an active dataset, with optional column overrides
  • RowCtx per-row context passed to generator() - exposes faker, random, spark, index, row; provides sequence(name), pool(view, col) and parent(table, role=None) for FK resolution
  • WrongDeclarativeBaseError - raised when a model from one declarative base is created inside a dataset for another base
  • Custom error hierarchy - RowsmythError roots named domain failures such as DatasetContextError, UnknownColumnError, EmptyPoolError, CompoundPrimaryKeyError and UnknownVariantError
  • Pool - wraps a Spark temp view column; .choice() returns a deferred PoolChoice resolved deterministically in Spark; .sample(k) picks distinct values without replacement
  • @variant decorator - marks a Model method as a named partial override; apply with Factory.variant(name)
  • Unity Catalog support - __catalog__, __schema__, __table_tags__, __comment__; Model.fqn(), Model.uc_tag_sql(), Model.column_tags(), Model.column_comments()
  • Model.__expectations__ - named dict[str, str] of check expressions for data quality frameworks

0.1.0 (2026-05-25)

Feat

  • declarative_base() with rowsmyth capabilities mixed in; accepts metadata, type_annotation_map and registry arguments matching DeclarativeBase
  • generators() classmethod for co-located factory-boy declarations; keys are column attributes or strings, values are any factory-boy declaration
  • @variant decorator for named model variants; returned dicts override specific generators when applied via .mix()
  • Model.factory(n) / Model.factory(min, max) for hierarchical data generation with exact or random-range counts
  • FactoryBuilder.has(*builders, via=None) to attach child builders; foreign keys resolved automatically from SQLAlchemy relationship metadata; use via= to disambiguate multiple relationships to the same parent
  • FactoryBuilder.mix(**proportions) to distribute generated instances across named variants using proportions that sum to <= 1.0; the remainder receives no variant
  • FactoryBuilder.where(overrides) to force fixed column values on every generated instance; takes precedence over variants and generators
  • FactoryBuilder.random_seed(value) to seed random and Faker for reproducible output
  • FactoryBuilder.create() to generate and persist instances to an in-memory SQLite database; returns a list of root model instances
  • Base.dataset(*builders) for flat multi-table generation; rows are created in FK dependency order with foreign keys sampled randomly from the created pool
  • Dataset.random_seed(value) to seed random and Faker for reproducible output
  • Dataset.create() to generate and persist instances; returns dict[str, list[Model]] keyed by __tablename__
  • Model.__comment__ classproperty for table-level comment from __table_args__
  • Model.__table_info__ classproperty for table-level info dict from __table_args__
  • Model.__column_info__ classproperty for per-column info dicts, keyed by column name
  • Model.__expectations__ classproperty for named CheckConstraint expressions keyed by constraint name; maps directly to data quality frameworks
  • Model.__spark_schema__ classproperty to convert the model to a PySpark StructType preserving nullability and column metadata; requires rowsmyth[spark]