Bug report
Bug description:
#132036 included an algorithmic change to shlex.quote that made it slower when the input has to be quoted. This is because the regular expression search was able to short-circuit at the first unsafe character.
However, the isascii check is worthwhile
Cc @picnixz
import re
import shlex
import timeit
# From 3.13
_find_unsafe = re.compile(r'[^\w@%+=:,./-]', re.ASCII).search
def old_quote(s):
"""Return a shell-escaped version of the string *s*."""
if not s:
return "''"
# BEST: if s.isascii() and _find_unsafe(s) is None:
if _find_unsafe(s) is None:
return s
# use single quotes, and put single quotes into double quotes
# the string $'b is then quoted as '$'"'"'b'
return "'" + s.replace("'", "'\"'\"'") + "'"
g = {'old_quote': old_quote, 'new_quote': shlex.quote}
print('with spaces')
print(' old', timeit.timeit("old_quote('the quick brown fox jumps over the lazy dog')", globals=g, number=1000000))
print(' new', timeit.timeit("new_quote('the quick brown fox jumps over the lazy dog')", globals=g, number=1000000))
print('without spaces')
print(' old', timeit.timeit("old_quote('thequickbrownfoxjumpsoverthelazydog')", globals=g, number=1000000))
print(' new', timeit.timeit("new_quote('thequickbrownfoxjumpsoverthelazydog')", globals=g, number=1000000))
print('non-ASCII')
print(' old', timeit.timeit("old_quote('mötley')", globals=g, number=1000000))
print(' new', timeit.timeit("new_quote('mötley')", globals=g, number=1000000))
print('short')
print(' old', timeit.timeit("old_quote('a')", globals=g, number=1000000))
print(' new', timeit.timeit("new_quote('a')", globals=g, number=1000000))
sample output:
with spaces
old 0.4148377259989502
new 0.5036935329990229
without spaces
old 0.3872929839999415
new 0.3540855330065824
ascii
old 0.4636239370011026
new 0.20726546400692314
short
old 0.1217202929983614
new 0.2977778149943333
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
#132036 included an algorithmic change to
shlex.quotethat made it slower when the input has to be quoted. This is because the regular expression search was able to short-circuit at the first unsafe character.However, the
isasciicheck is worthwhileCc @picnixz
sample output:
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Linked PRs
reto detect shlex.quote slow path #146408