feat(MachineLearning/PACLearning): add VersionSpace abstraction#592
Conversation
Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) to the PAC learning module. Version space — the subset of a concept class consistent with observed labeled data — is the structural substrate every sample complexity theorem operates on (Baby PAC, Sauer-Shelah, PAC lower bounds, NFL). This PR complements leanprover#492 by providing: - VersionSpace C S: the consistent subset of C given sample S - versionSpace_subset, versionSpace_empty_sample: sanity lemmas - versionSpace_antitone: structural monotonicity (more data → smaller VS), dual to sample complexity monotonicity - IsConsistent A C: predicate on learners (output always in version space) - IsConsistent.output_mem_conceptClass, output_consistent: consistent learner properties - mem_versionSpace_of_realizable: under realizable data, the target concept lies in the version space (the realizable-case bridge) - versionSpace_nonempty_of_realizable: corollary No measure theory, no new Mathlib dependencies, ~150 lines, 0 sorry. Together with leanprover#492 these suffice to state the Baby PAC theorem, Sauer-Shelah sample complexity, PAC lower bounds, and NFL as structural statements rather than ad-hoc computations.
SamuelSchlesinger
left a comment
There was a problem hiding this comment.
I want to see the bridge that makes this useful: version-space membership is exactly zero empirical error. I think that is what will motivate this PR and make the downstream work easier. The stuff you do have is sound and I appreciate this PR overall, my requests are mostly just stylistic commentary and generalization requests.
- File docstring above section; c implicit; rename output_consistent → output_agrees
- Add Mitchell1977 citation; generalize antitone via versionSpace_reindex; add mono_C
- Add empirical{Miscount,Measure,Error} defs + combinatorial and measure-theoretic bridges
- Add Realizable predicate + Realizable.versionSpace_nonempty
- Add ae_mem_versionSpace_of_realizable
- Add IsConsistent.{empiricalMiscount,empiricalError}_eq_zero corollaries
|
Hope the PR looks better now @SamuelSchlesinger |
|
It looks great, I think the last bit of the bridge is: Would you implement that? Then it's good to go. |
Hey, done as you suggested, sorry for missing this! |
|
✔ [126/126] Built mk_all:exe (988ms) Source of error in action |
|
Regenerated cslib.lean. Can someone rerun the ci-check? |
|
@fmontesi whenever you'd like, we can merge this. |
|
Excellent, thank you! |
Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) as a companion to the PAC learning definitions from #492.
VersionSpace C S: the subset ofCwhose concepts agree withSon every sample pointversionSpace_subset,versionSpace_empty_sample: sanity lemmasversionSpace_antitone: more data gives a smaller version spaceIsConsistent A C: predicate on learners whose output always lies in the version spacemem_versionSpace_of_realizable,versionSpace_nonempty_of_realizable: realizable-case bridgeFoundation for downstream proofs (Sauer-Shelah, PAC lower bounds, infinite NFL).