Regex character class range generator with certain characters excluded
While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:
regex_supplement.py
:
"""Supplementary regex modifying methods."""
from itertools import tee
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
From the itertools recipes.
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def _offset_byte_char(char, offset):
"""Offset a single byte char"""
return (ord(char) + offset).to_bytes(1, byteorder='big')
def escape_byte_in_character_class(char):
"""Escapes characters as necessary within the character class."""
return b'\' + char if char in b'-]^' else char
def byte_range(startchar, endchar, excluded):
"""Returns a pattern matching characters in the range startchar-endchar.
Characters in excluded are excluded from the range.
"""
excluded = sorted(char for char in excluded if startchar <= char <= endchar)
char_ranges =
if len(excluded) >= 1:
first_exclude = excluded[0]
if startchar != first_exclude:
# Another possibility = + 1
char_ranges.append(
(startchar, _offset_byte_char(first_exclude, -1))
)
for start, end in pairwise(excluded):
# Adjacent or equal
if ord(end) - 1 <= ord(start):
continue
char_ranges.append(
(_offset_byte_char(start, 1), _offset_byte_char(end, -1))
)
last_exclude = excluded[-1]
if endchar != last_exclude:
char_ranges.append(
(_offset_byte_char(last_exclude, 1), endchar)
)
else:
char_ranges = [(startchar, endchar)]
char_output = b''
escape = escape_byte_in_character_class
for char_range in char_ranges:
# Doesn't minimize all '-', but that quickly gets complicated.
# (Whether '-' needs to be escaped within a regex range is context dependent.)
# '^' has even more potential easy optimization.
start, end = char_range
if start == end:
char_output += escape(start)
elif ord(start) == ord(end) - 1:
char_output += escape(start) + escape(end)
else:
char_output += escape(start) + b'-' + escape(end)
return b'[' + char_output + b']'
test_regex_supplement.py
:
"""Regex supplement tests"""
import regex_supplement as re_supp
def test_byte_regex_range_empty():
"""Test that empty exclusions do not affect the range"""
assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'
def test_byte_regex_range_exclusion_outside():
"""An exclusion outside of the regex range should have no effect."""
assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
def test_byte_regex_range_escaped_rbrac():
"""Test that ']' is escaped"""
compare_escaped_within_range(b']')
def test_byte_regex_range_escaped_hyphen():
"""Test that '-' is escaped"""
compare_escaped_within_range(b'-')
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
compare_escaped_within_range(b'^')
def test_byte_regex_range_standard_1():
"""Test that a standard range behaves as expected"""
assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'
def test_byte_regex_range_standard_2():
"""Test that a standard range with multiple exclusions behaves as expected"""
assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'
def test_byte_regex_range_optimized_1():
"""Test that ranges of 1 char are optimized to single characters."""
assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'
def test_byte_regex_range_optimized_2():
"""Test that multiple ranges of 1 chars are optimized to single characters."""
assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'
This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]
), and it wasn't needed for my purposes.
I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.
Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.
python python-3.x regex
add a comment |
While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:
regex_supplement.py
:
"""Supplementary regex modifying methods."""
from itertools import tee
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
From the itertools recipes.
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def _offset_byte_char(char, offset):
"""Offset a single byte char"""
return (ord(char) + offset).to_bytes(1, byteorder='big')
def escape_byte_in_character_class(char):
"""Escapes characters as necessary within the character class."""
return b'\' + char if char in b'-]^' else char
def byte_range(startchar, endchar, excluded):
"""Returns a pattern matching characters in the range startchar-endchar.
Characters in excluded are excluded from the range.
"""
excluded = sorted(char for char in excluded if startchar <= char <= endchar)
char_ranges =
if len(excluded) >= 1:
first_exclude = excluded[0]
if startchar != first_exclude:
# Another possibility = + 1
char_ranges.append(
(startchar, _offset_byte_char(first_exclude, -1))
)
for start, end in pairwise(excluded):
# Adjacent or equal
if ord(end) - 1 <= ord(start):
continue
char_ranges.append(
(_offset_byte_char(start, 1), _offset_byte_char(end, -1))
)
last_exclude = excluded[-1]
if endchar != last_exclude:
char_ranges.append(
(_offset_byte_char(last_exclude, 1), endchar)
)
else:
char_ranges = [(startchar, endchar)]
char_output = b''
escape = escape_byte_in_character_class
for char_range in char_ranges:
# Doesn't minimize all '-', but that quickly gets complicated.
# (Whether '-' needs to be escaped within a regex range is context dependent.)
# '^' has even more potential easy optimization.
start, end = char_range
if start == end:
char_output += escape(start)
elif ord(start) == ord(end) - 1:
char_output += escape(start) + escape(end)
else:
char_output += escape(start) + b'-' + escape(end)
return b'[' + char_output + b']'
test_regex_supplement.py
:
"""Regex supplement tests"""
import regex_supplement as re_supp
def test_byte_regex_range_empty():
"""Test that empty exclusions do not affect the range"""
assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'
def test_byte_regex_range_exclusion_outside():
"""An exclusion outside of the regex range should have no effect."""
assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
def test_byte_regex_range_escaped_rbrac():
"""Test that ']' is escaped"""
compare_escaped_within_range(b']')
def test_byte_regex_range_escaped_hyphen():
"""Test that '-' is escaped"""
compare_escaped_within_range(b'-')
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
compare_escaped_within_range(b'^')
def test_byte_regex_range_standard_1():
"""Test that a standard range behaves as expected"""
assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'
def test_byte_regex_range_standard_2():
"""Test that a standard range with multiple exclusions behaves as expected"""
assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'
def test_byte_regex_range_optimized_1():
"""Test that ranges of 1 char are optimized to single characters."""
assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'
def test_byte_regex_range_optimized_2():
"""Test that multiple ranges of 1 chars are optimized to single characters."""
assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'
This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]
), and it wasn't needed for my purposes.
I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.
Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.
python python-3.x regex
add a comment |
While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:
regex_supplement.py
:
"""Supplementary regex modifying methods."""
from itertools import tee
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
From the itertools recipes.
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def _offset_byte_char(char, offset):
"""Offset a single byte char"""
return (ord(char) + offset).to_bytes(1, byteorder='big')
def escape_byte_in_character_class(char):
"""Escapes characters as necessary within the character class."""
return b'\' + char if char in b'-]^' else char
def byte_range(startchar, endchar, excluded):
"""Returns a pattern matching characters in the range startchar-endchar.
Characters in excluded are excluded from the range.
"""
excluded = sorted(char for char in excluded if startchar <= char <= endchar)
char_ranges =
if len(excluded) >= 1:
first_exclude = excluded[0]
if startchar != first_exclude:
# Another possibility = + 1
char_ranges.append(
(startchar, _offset_byte_char(first_exclude, -1))
)
for start, end in pairwise(excluded):
# Adjacent or equal
if ord(end) - 1 <= ord(start):
continue
char_ranges.append(
(_offset_byte_char(start, 1), _offset_byte_char(end, -1))
)
last_exclude = excluded[-1]
if endchar != last_exclude:
char_ranges.append(
(_offset_byte_char(last_exclude, 1), endchar)
)
else:
char_ranges = [(startchar, endchar)]
char_output = b''
escape = escape_byte_in_character_class
for char_range in char_ranges:
# Doesn't minimize all '-', but that quickly gets complicated.
# (Whether '-' needs to be escaped within a regex range is context dependent.)
# '^' has even more potential easy optimization.
start, end = char_range
if start == end:
char_output += escape(start)
elif ord(start) == ord(end) - 1:
char_output += escape(start) + escape(end)
else:
char_output += escape(start) + b'-' + escape(end)
return b'[' + char_output + b']'
test_regex_supplement.py
:
"""Regex supplement tests"""
import regex_supplement as re_supp
def test_byte_regex_range_empty():
"""Test that empty exclusions do not affect the range"""
assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'
def test_byte_regex_range_exclusion_outside():
"""An exclusion outside of the regex range should have no effect."""
assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
def test_byte_regex_range_escaped_rbrac():
"""Test that ']' is escaped"""
compare_escaped_within_range(b']')
def test_byte_regex_range_escaped_hyphen():
"""Test that '-' is escaped"""
compare_escaped_within_range(b'-')
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
compare_escaped_within_range(b'^')
def test_byte_regex_range_standard_1():
"""Test that a standard range behaves as expected"""
assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'
def test_byte_regex_range_standard_2():
"""Test that a standard range with multiple exclusions behaves as expected"""
assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'
def test_byte_regex_range_optimized_1():
"""Test that ranges of 1 char are optimized to single characters."""
assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'
def test_byte_regex_range_optimized_2():
"""Test that multiple ranges of 1 chars are optimized to single characters."""
assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'
This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]
), and it wasn't needed for my purposes.
I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.
Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.
python python-3.x regex
While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:
regex_supplement.py
:
"""Supplementary regex modifying methods."""
from itertools import tee
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...
From the itertools recipes.
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def _offset_byte_char(char, offset):
"""Offset a single byte char"""
return (ord(char) + offset).to_bytes(1, byteorder='big')
def escape_byte_in_character_class(char):
"""Escapes characters as necessary within the character class."""
return b'\' + char if char in b'-]^' else char
def byte_range(startchar, endchar, excluded):
"""Returns a pattern matching characters in the range startchar-endchar.
Characters in excluded are excluded from the range.
"""
excluded = sorted(char for char in excluded if startchar <= char <= endchar)
char_ranges =
if len(excluded) >= 1:
first_exclude = excluded[0]
if startchar != first_exclude:
# Another possibility = + 1
char_ranges.append(
(startchar, _offset_byte_char(first_exclude, -1))
)
for start, end in pairwise(excluded):
# Adjacent or equal
if ord(end) - 1 <= ord(start):
continue
char_ranges.append(
(_offset_byte_char(start, 1), _offset_byte_char(end, -1))
)
last_exclude = excluded[-1]
if endchar != last_exclude:
char_ranges.append(
(_offset_byte_char(last_exclude, 1), endchar)
)
else:
char_ranges = [(startchar, endchar)]
char_output = b''
escape = escape_byte_in_character_class
for char_range in char_ranges:
# Doesn't minimize all '-', but that quickly gets complicated.
# (Whether '-' needs to be escaped within a regex range is context dependent.)
# '^' has even more potential easy optimization.
start, end = char_range
if start == end:
char_output += escape(start)
elif ord(start) == ord(end) - 1:
char_output += escape(start) + escape(end)
else:
char_output += escape(start) + b'-' + escape(end)
return b'[' + char_output + b']'
test_regex_supplement.py
:
"""Regex supplement tests"""
import regex_supplement as re_supp
def test_byte_regex_range_empty():
"""Test that empty exclusions do not affect the range"""
assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'
def test_byte_regex_range_exclusion_outside():
"""An exclusion outside of the regex range should have no effect."""
assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
def test_byte_regex_range_escaped_rbrac():
"""Test that ']' is escaped"""
compare_escaped_within_range(b']')
def test_byte_regex_range_escaped_hyphen():
"""Test that '-' is escaped"""
compare_escaped_within_range(b'-')
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
compare_escaped_within_range(b'^')
def test_byte_regex_range_standard_1():
"""Test that a standard range behaves as expected"""
assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'
def test_byte_regex_range_standard_2():
"""Test that a standard range with multiple exclusions behaves as expected"""
assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'
def test_byte_regex_range_optimized_1():
"""Test that ranges of 1 char are optimized to single characters."""
assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'
def test_byte_regex_range_optimized_2():
"""Test that multiple ranges of 1 chars are optimized to single characters."""
assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'
This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]
), and it wasn't needed for my purposes.
I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.
Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.
python python-3.x regex
python python-3.x regex
edited Jan 2 at 21:12
asked Jan 2 at 20:46
Graham
970113
970113
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp
(namely the _offset_byte_char
method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it. It would at least be more maintainable to say
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
offset = re_supp._offset_byte_char
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
But it would be even better not to depend on that detail at all! Either make _offset_byte_char
a public member of the API (lose the underscore), or just reimplement it yourself where you need it.
When you reimplement it, consider that Python offers the chr
builtin to do exactly what you're currently doing with all that byteorder='big'
nonsense on ints. Although I guess Python3 does force you to know about latin_1
encoding:
offset = lambda x, y: chr(ord(x) + y).encode('latin_1')
IIUC, your public API really just consists of the function byte_range(b, e, xs)
, where b
and e
are single bytes and xs
is an iterable of bytes — which generates a regex character class matching any of the bytes in b
to e
inclusive, but excluding all of the bytes in xs
.
In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs]).
, or more succinctly (?![xs])[b-e]
.
I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)
Personally I would write ch
everywhere you wrote char
, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch
would still be shorter and arguably harder to confuse with chr
, though...
Again on the testing side: your compare_escaped_within_range
seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:
def test_inputs(b, e, xs):
needle = re_supp.byte_range(b, e, xs)
for i in range(256):
haystack = chr(i).encode('latin_1')
expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
actual = bool(re.match(needle, haystack))
assert actual == expected
And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs)
passed for their inputs. Even if they went on to verify other properties.
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
test_inputs(b'^', b'a', [b'a'])
assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?
Incidentally, you got very lucky that out of ]
, ^
, and -
, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3
is a magic number in this respect — or should you maybe document it for posterity?
I ran my test_inputs
on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So
test_inputs(b'\', b'A', )
fails.
Fix the trivial bug by adding two characters in escape_byte_in_character_class
. Now add a regression test: compare_escaped_within_range(b'\')
. The test still fails! Why? :)
And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A'])
fails.
And you don't handle test_inputs(b'B', b'A', )
either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).'
in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."
The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)
Very useful answer all around! Regarding switching tochr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings thatlatin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiringbyteorder
into_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
– Graham
2 days ago
1
I could have also donebytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done withoffset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just thecompare_escaped_within_range
test into a different test file, for example, unless you brought the globaloffset
along with it." If you extractedcompare_escaped_within_range
, it would keep the reference to the globaloffset
within the original module, unless I misunderstand what you mean by "extract".
– Graham
2 days ago
+1 forbytes([i])
. :)
– Quuxplusone
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210765%2fregex-character-class-range-generator-with-certain-characters-excluded%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp
(namely the _offset_byte_char
method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it. It would at least be more maintainable to say
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
offset = re_supp._offset_byte_char
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
But it would be even better not to depend on that detail at all! Either make _offset_byte_char
a public member of the API (lose the underscore), or just reimplement it yourself where you need it.
When you reimplement it, consider that Python offers the chr
builtin to do exactly what you're currently doing with all that byteorder='big'
nonsense on ints. Although I guess Python3 does force you to know about latin_1
encoding:
offset = lambda x, y: chr(ord(x) + y).encode('latin_1')
IIUC, your public API really just consists of the function byte_range(b, e, xs)
, where b
and e
are single bytes and xs
is an iterable of bytes — which generates a regex character class matching any of the bytes in b
to e
inclusive, but excluding all of the bytes in xs
.
In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs]).
, or more succinctly (?![xs])[b-e]
.
I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)
Personally I would write ch
everywhere you wrote char
, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch
would still be shorter and arguably harder to confuse with chr
, though...
Again on the testing side: your compare_escaped_within_range
seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:
def test_inputs(b, e, xs):
needle = re_supp.byte_range(b, e, xs)
for i in range(256):
haystack = chr(i).encode('latin_1')
expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
actual = bool(re.match(needle, haystack))
assert actual == expected
And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs)
passed for their inputs. Even if they went on to verify other properties.
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
test_inputs(b'^', b'a', [b'a'])
assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?
Incidentally, you got very lucky that out of ]
, ^
, and -
, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3
is a magic number in this respect — or should you maybe document it for posterity?
I ran my test_inputs
on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So
test_inputs(b'\', b'A', )
fails.
Fix the trivial bug by adding two characters in escape_byte_in_character_class
. Now add a regression test: compare_escaped_within_range(b'\')
. The test still fails! Why? :)
And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A'])
fails.
And you don't handle test_inputs(b'B', b'A', )
either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).'
in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."
The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)
Very useful answer all around! Regarding switching tochr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings thatlatin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiringbyteorder
into_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
– Graham
2 days ago
1
I could have also donebytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done withoffset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just thecompare_escaped_within_range
test into a different test file, for example, unless you brought the globaloffset
along with it." If you extractedcompare_escaped_within_range
, it would keep the reference to the globaloffset
within the original module, unless I misunderstand what you mean by "extract".
– Graham
2 days ago
+1 forbytes([i])
. :)
– Quuxplusone
2 days ago
add a comment |
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp
(namely the _offset_byte_char
method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it. It would at least be more maintainable to say
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
offset = re_supp._offset_byte_char
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
But it would be even better not to depend on that detail at all! Either make _offset_byte_char
a public member of the API (lose the underscore), or just reimplement it yourself where you need it.
When you reimplement it, consider that Python offers the chr
builtin to do exactly what you're currently doing with all that byteorder='big'
nonsense on ints. Although I guess Python3 does force you to know about latin_1
encoding:
offset = lambda x, y: chr(ord(x) + y).encode('latin_1')
IIUC, your public API really just consists of the function byte_range(b, e, xs)
, where b
and e
are single bytes and xs
is an iterable of bytes — which generates a regex character class matching any of the bytes in b
to e
inclusive, but excluding all of the bytes in xs
.
In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs]).
, or more succinctly (?![xs])[b-e]
.
I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)
Personally I would write ch
everywhere you wrote char
, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch
would still be shorter and arguably harder to confuse with chr
, though...
Again on the testing side: your compare_escaped_within_range
seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:
def test_inputs(b, e, xs):
needle = re_supp.byte_range(b, e, xs)
for i in range(256):
haystack = chr(i).encode('latin_1')
expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
actual = bool(re.match(needle, haystack))
assert actual == expected
And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs)
passed for their inputs. Even if they went on to verify other properties.
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
test_inputs(b'^', b'a', [b'a'])
assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?
Incidentally, you got very lucky that out of ]
, ^
, and -
, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3
is a magic number in this respect — or should you maybe document it for posterity?
I ran my test_inputs
on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So
test_inputs(b'\', b'A', )
fails.
Fix the trivial bug by adding two characters in escape_byte_in_character_class
. Now add a regression test: compare_escaped_within_range(b'\')
. The test still fails! Why? :)
And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A'])
fails.
And you don't handle test_inputs(b'B', b'A', )
either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).'
in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."
The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)
Very useful answer all around! Regarding switching tochr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings thatlatin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiringbyteorder
into_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
– Graham
2 days ago
1
I could have also donebytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done withoffset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just thecompare_escaped_within_range
test into a different test file, for example, unless you brought the globaloffset
along with it." If you extractedcompare_escaped_within_range
, it would keep the reference to the globaloffset
within the original module, unless I misunderstand what you mean by "extract".
– Graham
2 days ago
+1 forbytes([i])
. :)
– Quuxplusone
2 days ago
add a comment |
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp
(namely the _offset_byte_char
method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it. It would at least be more maintainable to say
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
offset = re_supp._offset_byte_char
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
But it would be even better not to depend on that detail at all! Either make _offset_byte_char
a public member of the API (lose the underscore), or just reimplement it yourself where you need it.
When you reimplement it, consider that Python offers the chr
builtin to do exactly what you're currently doing with all that byteorder='big'
nonsense on ints. Although I guess Python3 does force you to know about latin_1
encoding:
offset = lambda x, y: chr(ord(x) + y).encode('latin_1')
IIUC, your public API really just consists of the function byte_range(b, e, xs)
, where b
and e
are single bytes and xs
is an iterable of bytes — which generates a regex character class matching any of the bytes in b
to e
inclusive, but excluding all of the bytes in xs
.
In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs]).
, or more succinctly (?![xs])[b-e]
.
I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)
Personally I would write ch
everywhere you wrote char
, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch
would still be shorter and arguably harder to confuse with chr
, though...
Again on the testing side: your compare_escaped_within_range
seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:
def test_inputs(b, e, xs):
needle = re_supp.byte_range(b, e, xs)
for i in range(256):
haystack = chr(i).encode('latin_1')
expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
actual = bool(re.match(needle, haystack))
assert actual == expected
And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs)
passed for their inputs. Even if they went on to verify other properties.
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
test_inputs(b'^', b'a', [b'a'])
assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?
Incidentally, you got very lucky that out of ]
, ^
, and -
, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3
is a magic number in this respect — or should you maybe document it for posterity?
I ran my test_inputs
on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So
test_inputs(b'\', b'A', )
fails.
Fix the trivial bug by adding two characters in escape_byte_in_character_class
. Now add a regression test: compare_escaped_within_range(b'\')
. The test still fails! Why? :)
And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A'])
fails.
And you don't handle test_inputs(b'B', b'A', )
either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).'
in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."
The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)
offset = re_supp._offset_byte_char
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp
(namely the _offset_byte_char
method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it. It would at least be more maintainable to say
def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
offset = re_supp._offset_byte_char
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'
But it would be even better not to depend on that detail at all! Either make _offset_byte_char
a public member of the API (lose the underscore), or just reimplement it yourself where you need it.
When you reimplement it, consider that Python offers the chr
builtin to do exactly what you're currently doing with all that byteorder='big'
nonsense on ints. Although I guess Python3 does force you to know about latin_1
encoding:
offset = lambda x, y: chr(ord(x) + y).encode('latin_1')
IIUC, your public API really just consists of the function byte_range(b, e, xs)
, where b
and e
are single bytes and xs
is an iterable of bytes — which generates a regex character class matching any of the bytes in b
to e
inclusive, but excluding all of the bytes in xs
.
In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs]).
, or more succinctly (?![xs])[b-e]
.
I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)
Personally I would write ch
everywhere you wrote char
, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch
would still be shorter and arguably harder to confuse with chr
, though...
Again on the testing side: your compare_escaped_within_range
seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:
def test_inputs(b, e, xs):
needle = re_supp.byte_range(b, e, xs)
for i in range(256):
haystack = chr(i).encode('latin_1')
expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
actual = bool(re.match(needle, haystack))
assert actual == expected
And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs)
passed for their inputs. Even if they went on to verify other properties.
def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
test_inputs(b'^', b'a', [b'a'])
assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?
Incidentally, you got very lucky that out of ]
, ^
, and -
, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3
is a magic number in this respect — or should you maybe document it for posterity?
I ran my test_inputs
on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So
test_inputs(b'\', b'A', )
fails.
Fix the trivial bug by adding two characters in escape_byte_in_character_class
. Now add a regression test: compare_escaped_within_range(b'\')
. The test still fails! Why? :)
And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A'])
fails.
And you don't handle test_inputs(b'B', b'A', )
either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).'
in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."
The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)
answered Jan 3 at 5:12
Quuxplusone
11.5k11959
11.5k11959
Very useful answer all around! Regarding switching tochr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings thatlatin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiringbyteorder
into_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
– Graham
2 days ago
1
I could have also donebytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done withoffset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just thecompare_escaped_within_range
test into a different test file, for example, unless you brought the globaloffset
along with it." If you extractedcompare_escaped_within_range
, it would keep the reference to the globaloffset
within the original module, unless I misunderstand what you mean by "extract".
– Graham
2 days ago
+1 forbytes([i])
. :)
– Quuxplusone
2 days ago
add a comment |
Very useful answer all around! Regarding switching tochr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings thatlatin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiringbyteorder
into_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
– Graham
2 days ago
1
I could have also donebytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done withoffset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just thecompare_escaped_within_range
test into a different test file, for example, unless you brought the globaloffset
along with it." If you extractedcompare_escaped_within_range
, it would keep the reference to the globaloffset
within the original module, unless I misunderstand what you mean by "extract".
– Graham
2 days ago
+1 forbytes([i])
. :)
– Quuxplusone
2 days ago
Very useful answer all around! Regarding switching to
chr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings that latin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder
in to_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.– Graham
2 days ago
Very useful answer all around! Regarding switching to
chr().encode('latin1')
, I feel like that's relying on an implementation detail of bytestrings that latin1
happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder
in to_bytes
. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.– Graham
2 days ago
1
1
I could have also done
bytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it." If you extracted compare_escaped_within_range
, it would keep the reference to the global offset
within the original module, unless I misunderstand what you mean by "extract".– Graham
2 days ago
I could have also done
bytes([i])
for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset
is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range
test into a different test file, for example, unless you brought the global offset
along with it." If you extracted compare_escaped_within_range
, it would keep the reference to the global offset
within the original module, unless I misunderstand what you mean by "extract".– Graham
2 days ago
+1 for
bytes([i])
. :)– Quuxplusone
2 days ago
+1 for
bytes([i])
. :)– Quuxplusone
2 days ago
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210765%2fregex-character-class-range-generator-with-certain-characters-excluded%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown