A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design

In this tutorial, we explore property-based testing using Hypothesis and build a rigorous testing pipeline that goes far beyond traditional unit testing. We implement invariants, differential testing, metamorphic testing, targeted exploration, and stateful testing to validate both functional correctness and behavioral guarantees of our systems. Instead of manually crafting edge cases, we let Hypot

hesis generate structured inputs, shrink failures to minimal counterexamples, and systematically uncover hidden bugs. Also, we demonstrate how modern testing practices can be integrated directly into experimental and research-driven workflows. Copy CodeCopiedUse a different Browserimport sys, textwrap, subprocess, os, re, math !{sys.executable} -m pip -q install hypothesis pytest test_code = r''' import re, math import pytest from hypothesis import ( given, assume, example, settings, note, target, HealthCheck, Phase ) from hypothesis import strategies as st from hypothesis.stateful import RuleBasedStateMachine, rule, invariant, initialize, precondition def clamp(x: int, lo: int, hi: int) -> int: if x hi: return hi return x def normalize_whitespace(s: str) -> str: return " ".join(s.split()) def is_sorted_non_decreasing(xs): return all(xs[i] 2000: return (False, "too_big") try: return (True, int(t)) except Exception: return (False, "parse_error") def safe_parse_int_alt(s: str): t = s.strip() if not t: return (False, "not_an_int") sign = 1 if t[0] == "+": t = t[1:] elif t[0] == "-": sign = -1 t = t[1:] if not t or any(ch "9" for ch in t): return (False, "not_an_int") if len(t) > 2000: return (False, "too_big") val = 0 for ch in t: val = val * 10 + (ord(ch) - 48) return (True, sign * val) bounds = st.tuples(st.integers(-10_000, 10_000), st.integers(-10_000, 10_000)).map( lambda t: (t[0], t[1]) if t[0] 2000: assert ok is False and val == "too_big" else: assert ok is True and isinstance(val, int) def variance(xs): if len(xs) = 0.0 k = 7 assert math.isclose(variance([x + k for x in xs]), v, rel_tol=1e-12, abs_tol=1e-12) We extend our validation to parsing robustness and statistical correctness using targeted exploration.

We verify that two independent integer parsers agree on structured inputs and enforce rejection rules on invalid strings. We further implement metamorphic testing by validating invariants of variance under transformation. Copy CodeCopiedUse a different Browserclass Bank: def __init__(self): self.balance = 0 self.ledger = [] def deposit(self, amt: int): if amt self.balance: raise ValueError("insufficient funds") self.balance -= amt self.ledger.append(("wd", amt)) def replay_balance(self): bal = 0 for typ, amt in self.ledger: bal += amt if typ == "dep" else -amt return bal class BankMachine(RuleBasedStateMachine): def __init__(self): super().__init__() self.bank = Bank() @initialize() def init(self): assert self.bank.balance == 0 assert self.bank.replay_balance() == 0 @rule(amt=st.integers(min_value=1, max_value=10_000)) def deposit(self, amt): self.bank.deposit(amt) @precondition(lambda self: self.bank.balance > 0) @rule(amt=st.integers(min_value=1, max_value=10_000)) def withdraw(self, amt): assume(amt = 0 @invariant() def ledger_replay_matches_balance(self): assert self.bank.replay_balance() == self.bank.balance TestBankMachine = BankMachine.TestCase ''' path = "/tmp/test_hypothesis_advanced.py" with open(path, "w", encoding="utf-8") as f: f.write(test_code) print("Hypothesis version:", __import__("hypothesis").__version__) print("\nRunning pytest on:", path, "\n") res = subprocess.run([sys.executable, "-m", "pytest", "-q", path], capture_output=True, text=True) print(res.stdout) if res.returncode != 0: print(res.stderr) if res.returncode == 0: print("\nAll Hypothesis tests passed.") elif res.returncode == 5: print("\nPytest collected no tests.") else: print("\nSome tests failed.") We implement a stateful system using Hypothesis’s rule-based state machine to simulate a bank account.

We define rules, preconditions, and invariants to guarantee balance consistency and ledger integrity under arbitrary operation sequences. We then execute the entire test suite via pytest, allowing Hypothesis to automatically discover counterexamples and verify system correctness. In conclusion, we built a comprehensive property-based testing framework that validates pure functions, parsing logic, statistical behavior, and even stateful systems with invariants.

We leveraged Hypothesis’s shrinking, targeted search, and state machine testing capabilities to move from example-based testing to behavior-driven verification. It allows us to reason about correctness at a higher level of abstraction while maintaining strong guarantees for edge cases and system consistency. Check out the Full Coding Notebook here.

Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well. Need to partner with us for pro