Bug report
Bug description:
Issue
The add_handler() method of urllib.request.OpenerDirector matches too broadly, and generates spurious handler entries as a result.
Specifically, any method name that starts with zero or one underscore, followed by error, open, request, or response, and that isn't in the small list of exclusions, will be treated as a handler method.
Minimal reproducible example
>>> import urllib.request
>>> class BogusHandler(urllib.request.BaseHandler):
... def _open(self):
... pass
... def open(self):
... pass
... def error(self):
... pass
...
>>> opdir = urllib.request.OpenerDirector()
>>> opdir.add_handler(BogusHandler())
>>> opdir.handle_open
{'': [<__main__.BogusHandler...>], 'ope': [<__main__.BogusHandler...>]}
>>> opdir.handle_error
{'erro': {'error': [<__main__.BogusHandler...>]}}
Cause
|
i = meth.find("_") |
|
protocol = meth[:i] |
|
condition = meth[i+1:] |
In the normal case, this code splits a method name like http_open on the first underscore character, giving protocol == 'http' and condition == 'open'.
If there is a leading underscore character, this produces a false match with an empty protocol. For example, a method called _open will give protocol == '' and condition == 'open'.
If there is no underscore character, then meth.find('_') == -1, and weird results happen. In this case, condition contains the entire method name, and protocol contains all but the last character. For example, a method called open will give protocol == 'ope' and condition == 'open'.
The subsequent lines branch on the content of condition, with only error*, open, request, and response treated as handler methods.
Impact
Low. It is probably impossible that a request would have the empty string as its protocol, and unlikely that it would be ope, reques, respons, or erro. If one of these did happen, the spurious match wouldn't be called: OpenerDirector would go looking for (say) ope_open, probably not find it, and raise an AttributeError. (It's for this reason that I'm positive this is a bug, and not an undocumented feature where open, etc. are meant to be registered as handlers.)
Spurious entries are added to the internal members handle_open, handle_error, process_response, process_request, and (especially) the catch-all handler, using bisect to perform sort-preserving insertions. This is unnecessary busywork, but I have no data on any actual performance impact.
Solution
It suffices to add a check like so, immediately after meth.find("_"):
This catches both a leading underscore (i == 0) and no underscore (i == -1). I've forked the repo and made such a change myself, but submitting a pull request is a big step up from filing an issue, and it expects an issue to be filed first in any case, so… here it is! 😄
Related issues
I can only see one relevant existing issue, #61924. This was a withdrawn proposal to clean up the add_handler() code. It was more far-reaching than the solution above, but (on my inspection at least) it would've fixed this bug. At the time, @bitdancer commented, "I'll try to keep this patch in mind if we have occasion to fix anything in that code".
The code to extract a specific type of error behaves similarly:
|
j = condition.find("_") + i + 1 |
|
kind = meth[j+1:] |
However, I'm not positive that it's an error for it to give (for example) kind == 'error' when the method name is *_error (or _error or error), so I haven't explored further.
CPython versions tested on:
3.12
Operating systems tested on:
No response
Linked PRs
Bug report
Bug description:
Issue
The
add_handler()method ofurllib.request.OpenerDirectormatches too broadly, and generates spurious handler entries as a result.Specifically, any method name that starts with zero or one underscore, followed by
error,open,request, orresponse, and that isn't in the small list of exclusions, will be treated as a handler method.Minimal reproducible example
Cause
cpython/Lib/urllib/request.py
Lines 419 to 421 in 401fff7
In the normal case, this code splits a method name like
http_openon the first underscore character, givingprotocol == 'http'andcondition == 'open'.If there is a leading underscore character, this produces a false match with an empty protocol. For example, a method called
_openwill giveprotocol == ''andcondition == 'open'.If there is no underscore character, then
meth.find('_') == -1, and weird results happen. In this case,conditioncontains the entire method name, andprotocolcontains all but the last character. For example, a method calledopenwill giveprotocol == 'ope'andcondition == 'open'.The subsequent lines branch on the content of
condition, with onlyerror*,open,request, andresponsetreated as handler methods.Impact
Low. It is probably impossible that a request would have the empty string as its protocol, and unlikely that it would be
ope,reques,respons, orerro. If one of these did happen, the spurious match wouldn't be called:OpenerDirectorwould go looking for (say)ope_open, probably not find it, and raise anAttributeError. (It's for this reason that I'm positive this is a bug, and not an undocumented feature whereopen, etc. are meant to be registered as handlers.)Spurious entries are added to the internal members
handle_open,handle_error,process_response,process_request, and (especially) the catch-allhandler, usingbisectto perform sort-preserving insertions. This is unnecessary busywork, but I have no data on any actual performance impact.Solution
It suffices to add a check like so, immediately after
meth.find("_"):This catches both a leading underscore (
i == 0) and no underscore (i == -1). I've forked the repo and made such a change myself, but submitting a pull request is a big step up from filing an issue, and it expects an issue to be filed first in any case, so… here it is! 😄Related issues
I can only see one relevant existing issue, #61924. This was a withdrawn proposal to clean up the
add_handler()code. It was more far-reaching than the solution above, but (on my inspection at least) it would've fixed this bug. At the time, @bitdancer commented, "I'll try to keep this patch in mind if we have occasion to fix anything in that code".The code to extract a specific type of error behaves similarly:
cpython/Lib/urllib/request.py
Lines 424 to 425 in 401fff7
However, I'm not positive that it's an error for it to give (for example)
kind == 'error'when the method name is*_error(or_errororerror), so I haven't explored further.CPython versions tested on:
3.12
Operating systems tested on:
No response
Linked PRs