Fix INDEPENDENT mechanism crash on duplicate cliques#4
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
`independent.run_mechanism` builds `measurements` from `initial_measurements`
(the one-way marginals that `data_generation_v2.generate` always passes in)
and then appends a freshly measured one-way marginal for every attribute. When
`initial_potentials` is not None (e.g. the empty CliqueVector returned by
`constraints.get_initial_parameters` for the no-constraints case), the code
calls `potentials.expand([m.clique for m in measurements])` with a clique list
that now contains duplicate one-way cliques.
With current `mbi`, `CliqueVector.expand` requires unique cliques and raises:
ValueError: Cliques must be unique.
so `dpsynth.generate(..., discrete_config=IndependentConfig())` fails. De-dup
the clique list (order-preserving) before expanding. Measurements themselves are
left unchanged, so the estimation/accounting are unaffected.
fc87d86 to
1755e90
Compare
| potentials = initial_potentials | ||
| if potentials is not None: | ||
| potentials = potentials.expand([m.clique for m in measurements]) | ||
| # `measurements` can contain the same clique more than once (the one-way |
There was a problem hiding this comment.
thanks for the bug fix, but can you remove this inline comment or reduce it to 1 line?
There was a problem hiding this comment.
Can you also add test coverage for this case?
There was a problem hiding this comment.
Thanks for the review, @ryan112358! I've addressed both comments in 7d5ed01:
- Inline comment — reduced it to a single line.
- Test coverage — added test_duplicate_one_way_cliques_with_initial_potentials in independent_test.py. It exercises
the exact failing path (an initial_potentials CliqueVector plus initial_measurements containing the one-way marginals,
so the cliques measured in the loop duplicate them), which previously raised ValueError: Cliques must be unique.. I
confirmed the new test fails without the fix and passes with it.
Both tests pass locally (2 passed).
Let me know if you'd like any further changes!
- Reduce the inline comment in `run_mechanism` to a single line. - Add a regression test exercising the duplicate-clique path (initial potentials + initial one-way measurements) that previously raised "Cliques must be unique.".
7d5ed01 to
55c1f4b
Compare
Fixes the
ValueError: Cliques must be unique.raised when runningdpsynth.generate(..., discrete_config=IndependentConfig())(see issue:"IndependentConfig synthesis raises 'Cliques must be unique.'").
Cause
independent.run_mechanismbuildsmeasurementsfrominitial_measurements(the one-way marginals that
data_generation_v2.generatealways provides) andthen appends a freshly measured one-way marginal for every attribute, so the
list contains each one-way clique twice. When
initial_potentialsis notNone(e.g. the emptyCliqueVectorfromconstraints.get_initial_parametersin the no-constraints case),
potentials.expand([m.clique for m in measurements])is called with duplicate cliques, which currentmbirejects.Change
De-duplicate the clique list (order-preserving) before
expand:measurementsitself is unchanged, so the marginals fed tomirror_descentand the privacy accounting are unaffected — only the clique set used to expand
the (zero) potentials is de-duplicated.
Verification
The repro in the issue (a 2-column categorical frame with
IndependentConfig()) now returns a syntheticDataFrameinstead of raising.MST/AIM are unaffected.
Fixes #2