Regex character class range generator with certain characters excluded












5














While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:



regex_supplement.py:



"""Supplementary regex modifying methods."""

from itertools import tee

def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...

From the itertools recipes.
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)

def _offset_byte_char(char, offset):
"""Offset a single byte char"""
return (ord(char) + offset).to_bytes(1, byteorder='big')

def escape_byte_in_character_class(char):
"""Escapes characters as necessary within the character class."""
return b'\' + char if char in b'-]^' else char

def byte_range(startchar, endchar, excluded):
"""Returns a pattern matching characters in the range startchar-endchar.
Characters in excluded are excluded from the range.
"""
excluded = sorted(char for char in excluded if startchar <= char <= endchar)

char_ranges =
if len(excluded) >= 1:
first_exclude = excluded[0]
if startchar != first_exclude:
# Another possibility = + 1
char_ranges.append(
(startchar, _offset_byte_char(first_exclude, -1))
)
for start, end in pairwise(excluded):
# Adjacent or equal
if ord(end) - 1 <= ord(start):
continue
char_ranges.append(
(_offset_byte_char(start, 1), _offset_byte_char(end, -1))
)
last_exclude = excluded[-1]
if endchar != last_exclude:
char_ranges.append(
(_offset_byte_char(last_exclude, 1), endchar)
)
else:
char_ranges = [(startchar, endchar)]

char_output = b''
escape = escape_byte_in_character_class
for char_range in char_ranges:
# Doesn't minimize all '-', but that quickly gets complicated.
# (Whether '-' needs to be escaped within a regex range is context dependent.)
# '^' has even more potential easy optimization.
start, end = char_range
if start == end:
char_output += escape(start)
elif ord(start) == ord(end) - 1:
char_output += escape(start) + escape(end)
else:
char_output += escape(start) + b'-' + escape(end)
return b'[' + char_output + b']'


test_regex_supplement.py:



"""Regex supplement tests"""

import regex_supplement as re_supp

def test_byte_regex_range_empty():
"""Test that empty exclusions do not affect the range"""
assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'

def test_byte_regex_range_exclusion_outside():
"""An exclusion outside of the regex range should have no effect."""
assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'

offset = re_supp._offset_byte_char

def compare_escaped_within_range(char):
"""Test that the character is escaped at the beginning of a range."""
end = offset(char, 3)
assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'

def test_byte_regex_range_escaped_rbrac():
"""Test that ']' is escaped"""
compare_escaped_within_range(b']')

def test_byte_regex_range_escaped_hyphen():
"""Test that '-' is escaped"""
compare_escaped_within_range(b'-')

def test_byte_regex_range_escaped_caret():
"""Test that '^' is escaped"""
compare_escaped_within_range(b'^')

def test_byte_regex_range_standard_1():
"""Test that a standard range behaves as expected"""
assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'

def test_byte_regex_range_standard_2():
"""Test that a standard range with multiple exclusions behaves as expected"""
assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'

def test_byte_regex_range_optimized_1():
"""Test that ranges of 1 char are optimized to single characters."""
assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'

def test_byte_regex_range_optimized_2():
"""Test that multiple ranges of 1 chars are optimized to single characters."""
assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'


This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]), and it wasn't needed for my purposes.



I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.



Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.










share|improve this question





























    5














    While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:



    regex_supplement.py:



    """Supplementary regex modifying methods."""

    from itertools import tee

    def pairwise(iterable):
    """s -> (s0,s1), (s1,s2), (s2, s3), ...

    From the itertools recipes.
    """
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

    def _offset_byte_char(char, offset):
    """Offset a single byte char"""
    return (ord(char) + offset).to_bytes(1, byteorder='big')

    def escape_byte_in_character_class(char):
    """Escapes characters as necessary within the character class."""
    return b'\' + char if char in b'-]^' else char

    def byte_range(startchar, endchar, excluded):
    """Returns a pattern matching characters in the range startchar-endchar.
    Characters in excluded are excluded from the range.
    """
    excluded = sorted(char for char in excluded if startchar <= char <= endchar)

    char_ranges =
    if len(excluded) >= 1:
    first_exclude = excluded[0]
    if startchar != first_exclude:
    # Another possibility = + 1
    char_ranges.append(
    (startchar, _offset_byte_char(first_exclude, -1))
    )
    for start, end in pairwise(excluded):
    # Adjacent or equal
    if ord(end) - 1 <= ord(start):
    continue
    char_ranges.append(
    (_offset_byte_char(start, 1), _offset_byte_char(end, -1))
    )
    last_exclude = excluded[-1]
    if endchar != last_exclude:
    char_ranges.append(
    (_offset_byte_char(last_exclude, 1), endchar)
    )
    else:
    char_ranges = [(startchar, endchar)]

    char_output = b''
    escape = escape_byte_in_character_class
    for char_range in char_ranges:
    # Doesn't minimize all '-', but that quickly gets complicated.
    # (Whether '-' needs to be escaped within a regex range is context dependent.)
    # '^' has even more potential easy optimization.
    start, end = char_range
    if start == end:
    char_output += escape(start)
    elif ord(start) == ord(end) - 1:
    char_output += escape(start) + escape(end)
    else:
    char_output += escape(start) + b'-' + escape(end)
    return b'[' + char_output + b']'


    test_regex_supplement.py:



    """Regex supplement tests"""

    import regex_supplement as re_supp

    def test_byte_regex_range_empty():
    """Test that empty exclusions do not affect the range"""
    assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'

    def test_byte_regex_range_exclusion_outside():
    """An exclusion outside of the regex range should have no effect."""
    assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'

    offset = re_supp._offset_byte_char

    def compare_escaped_within_range(char):
    """Test that the character is escaped at the beginning of a range."""
    end = offset(char, 3)
    assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'

    def test_byte_regex_range_escaped_rbrac():
    """Test that ']' is escaped"""
    compare_escaped_within_range(b']')

    def test_byte_regex_range_escaped_hyphen():
    """Test that '-' is escaped"""
    compare_escaped_within_range(b'-')

    def test_byte_regex_range_escaped_caret():
    """Test that '^' is escaped"""
    compare_escaped_within_range(b'^')

    def test_byte_regex_range_standard_1():
    """Test that a standard range behaves as expected"""
    assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'

    def test_byte_regex_range_standard_2():
    """Test that a standard range with multiple exclusions behaves as expected"""
    assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'

    def test_byte_regex_range_optimized_1():
    """Test that ranges of 1 char are optimized to single characters."""
    assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'

    def test_byte_regex_range_optimized_2():
    """Test that multiple ranges of 1 chars are optimized to single characters."""
    assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'


    This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]), and it wasn't needed for my purposes.



    I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.



    Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.










    share|improve this question



























      5












      5








      5







      While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:



      regex_supplement.py:



      """Supplementary regex modifying methods."""

      from itertools import tee

      def pairwise(iterable):
      """s -> (s0,s1), (s1,s2), (s2, s3), ...

      From the itertools recipes.
      """
      a, b = tee(iterable)
      next(b, None)
      return zip(a, b)

      def _offset_byte_char(char, offset):
      """Offset a single byte char"""
      return (ord(char) + offset).to_bytes(1, byteorder='big')

      def escape_byte_in_character_class(char):
      """Escapes characters as necessary within the character class."""
      return b'\' + char if char in b'-]^' else char

      def byte_range(startchar, endchar, excluded):
      """Returns a pattern matching characters in the range startchar-endchar.
      Characters in excluded are excluded from the range.
      """
      excluded = sorted(char for char in excluded if startchar <= char <= endchar)

      char_ranges =
      if len(excluded) >= 1:
      first_exclude = excluded[0]
      if startchar != first_exclude:
      # Another possibility = + 1
      char_ranges.append(
      (startchar, _offset_byte_char(first_exclude, -1))
      )
      for start, end in pairwise(excluded):
      # Adjacent or equal
      if ord(end) - 1 <= ord(start):
      continue
      char_ranges.append(
      (_offset_byte_char(start, 1), _offset_byte_char(end, -1))
      )
      last_exclude = excluded[-1]
      if endchar != last_exclude:
      char_ranges.append(
      (_offset_byte_char(last_exclude, 1), endchar)
      )
      else:
      char_ranges = [(startchar, endchar)]

      char_output = b''
      escape = escape_byte_in_character_class
      for char_range in char_ranges:
      # Doesn't minimize all '-', but that quickly gets complicated.
      # (Whether '-' needs to be escaped within a regex range is context dependent.)
      # '^' has even more potential easy optimization.
      start, end = char_range
      if start == end:
      char_output += escape(start)
      elif ord(start) == ord(end) - 1:
      char_output += escape(start) + escape(end)
      else:
      char_output += escape(start) + b'-' + escape(end)
      return b'[' + char_output + b']'


      test_regex_supplement.py:



      """Regex supplement tests"""

      import regex_supplement as re_supp

      def test_byte_regex_range_empty():
      """Test that empty exclusions do not affect the range"""
      assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'

      def test_byte_regex_range_exclusion_outside():
      """An exclusion outside of the regex range should have no effect."""
      assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'

      offset = re_supp._offset_byte_char

      def compare_escaped_within_range(char):
      """Test that the character is escaped at the beginning of a range."""
      end = offset(char, 3)
      assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'

      def test_byte_regex_range_escaped_rbrac():
      """Test that ']' is escaped"""
      compare_escaped_within_range(b']')

      def test_byte_regex_range_escaped_hyphen():
      """Test that '-' is escaped"""
      compare_escaped_within_range(b'-')

      def test_byte_regex_range_escaped_caret():
      """Test that '^' is escaped"""
      compare_escaped_within_range(b'^')

      def test_byte_regex_range_standard_1():
      """Test that a standard range behaves as expected"""
      assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'

      def test_byte_regex_range_standard_2():
      """Test that a standard range with multiple exclusions behaves as expected"""
      assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'

      def test_byte_regex_range_optimized_1():
      """Test that ranges of 1 char are optimized to single characters."""
      assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'

      def test_byte_regex_range_optimized_2():
      """Test that multiple ranges of 1 chars are optimized to single characters."""
      assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'


      This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]), and it wasn't needed for my purposes.



      I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.



      Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.










      share|improve this question















      While I was working on a personal project, I found I needed to exclude certain characters in a regex range. So I thought, why not implement a custom range exclusion check for greater efficiency, it'll only take 5 lines or so. 60+ lines later, I produced this:



      regex_supplement.py:



      """Supplementary regex modifying methods."""

      from itertools import tee

      def pairwise(iterable):
      """s -> (s0,s1), (s1,s2), (s2, s3), ...

      From the itertools recipes.
      """
      a, b = tee(iterable)
      next(b, None)
      return zip(a, b)

      def _offset_byte_char(char, offset):
      """Offset a single byte char"""
      return (ord(char) + offset).to_bytes(1, byteorder='big')

      def escape_byte_in_character_class(char):
      """Escapes characters as necessary within the character class."""
      return b'\' + char if char in b'-]^' else char

      def byte_range(startchar, endchar, excluded):
      """Returns a pattern matching characters in the range startchar-endchar.
      Characters in excluded are excluded from the range.
      """
      excluded = sorted(char for char in excluded if startchar <= char <= endchar)

      char_ranges =
      if len(excluded) >= 1:
      first_exclude = excluded[0]
      if startchar != first_exclude:
      # Another possibility = + 1
      char_ranges.append(
      (startchar, _offset_byte_char(first_exclude, -1))
      )
      for start, end in pairwise(excluded):
      # Adjacent or equal
      if ord(end) - 1 <= ord(start):
      continue
      char_ranges.append(
      (_offset_byte_char(start, 1), _offset_byte_char(end, -1))
      )
      last_exclude = excluded[-1]
      if endchar != last_exclude:
      char_ranges.append(
      (_offset_byte_char(last_exclude, 1), endchar)
      )
      else:
      char_ranges = [(startchar, endchar)]

      char_output = b''
      escape = escape_byte_in_character_class
      for char_range in char_ranges:
      # Doesn't minimize all '-', but that quickly gets complicated.
      # (Whether '-' needs to be escaped within a regex range is context dependent.)
      # '^' has even more potential easy optimization.
      start, end = char_range
      if start == end:
      char_output += escape(start)
      elif ord(start) == ord(end) - 1:
      char_output += escape(start) + escape(end)
      else:
      char_output += escape(start) + b'-' + escape(end)
      return b'[' + char_output + b']'


      test_regex_supplement.py:



      """Regex supplement tests"""

      import regex_supplement as re_supp

      def test_byte_regex_range_empty():
      """Test that empty exclusions do not affect the range"""
      assert re_supp.byte_range(b'a', b'c', ) == b'[a-c]'

      def test_byte_regex_range_exclusion_outside():
      """An exclusion outside of the regex range should have no effect."""
      assert re_supp.byte_range(b'a', b'c', [b'e']) == b'[a-c]'

      offset = re_supp._offset_byte_char

      def compare_escaped_within_range(char):
      """Test that the character is escaped at the beginning of a range."""
      end = offset(char, 3)
      assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'

      def test_byte_regex_range_escaped_rbrac():
      """Test that ']' is escaped"""
      compare_escaped_within_range(b']')

      def test_byte_regex_range_escaped_hyphen():
      """Test that '-' is escaped"""
      compare_escaped_within_range(b'-')

      def test_byte_regex_range_escaped_caret():
      """Test that '^' is escaped"""
      compare_escaped_within_range(b'^')

      def test_byte_regex_range_standard_1():
      """Test that a standard range behaves as expected"""
      assert re_supp.byte_range(b'a', b'g', [b'd']) == b'[a-ce-g]'

      def test_byte_regex_range_standard_2():
      """Test that a standard range with multiple exclusions behaves as expected"""
      assert re_supp.byte_range(b'a', b'k', [b'd', b'h']) == b'[a-ce-gi-k]'

      def test_byte_regex_range_optimized_1():
      """Test that ranges of 1 char are optimized to single characters."""
      assert re_supp.byte_range(b'a', b'c', [b'b']) == b'[ac]'

      def test_byte_regex_range_optimized_2():
      """Test that multiple ranges of 1 chars are optimized to single characters."""
      assert re_supp.byte_range(b'a', b'e', [b'b', b'd']) == b'[ace]'


      This is only implemented for bytestrings, because that's what I needed for my project. I actually learned that Python regular expressions can handle bytestrings during this project. The tests are intended to be run by pytest. I was originally considering adding in an optimization for single character ranges, but I decided not to because it would lead to more complicated escape-handling code (and the possibility for subtle bugs like double-escaping ]), and it wasn't needed for my purposes.



      I'm mostly concerned with efficiency (mostly of the resultant regex, but also of the program) and accuracy-checking, but stylistic and readability improvements are also appreciated.



      Also, in hindsight, I might have considered implementing a lookahead with an exclusion character check preceding the range, but my current approach does have the advantage of discarding excluded characters that are outside of the range, and requiring less escaping.







      python python-3.x regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 2 at 21:12

























      asked Jan 2 at 20:46









      Graham

      970113




      970113






















          1 Answer
          1






          active

          oldest

          votes


















          4














          offset = re_supp._offset_byte_char

          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp (namely the _offset_byte_char method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it. It would at least be more maintainable to say



          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          offset = re_supp._offset_byte_char
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          But it would be even better not to depend on that detail at all! Either make _offset_byte_char a public member of the API (lose the underscore), or just reimplement it yourself where you need it.



          When you reimplement it, consider that Python offers the chr builtin to do exactly what you're currently doing with all that byteorder='big' nonsense on ints. Although I guess Python3 does force you to know about latin_1 encoding:



          offset = lambda x, y: chr(ord(x) + y).encode('latin_1')




          IIUC, your public API really just consists of the function byte_range(b, e, xs), where b and e are single bytes and xs is an iterable of bytes — which generates a regex character class matching any of the bytes in b to e inclusive, but excluding all of the bytes in xs.



          In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs])., or more succinctly (?![xs])[b-e].



          I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)





          Personally I would write ch everywhere you wrote char, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch would still be shorter and arguably harder to confuse with chr, though...





          Again on the testing side: your compare_escaped_within_range seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:



          def test_inputs(b, e, xs):
          needle = re_supp.byte_range(b, e, xs)
          for i in range(256):
          haystack = chr(i).encode('latin_1')
          expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
          actual = bool(re.match(needle, haystack))
          assert actual == expected


          And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs) passed for their inputs. Even if they went on to verify other properties.



          def test_byte_regex_range_escaped_caret():
          """Test that '^' is escaped"""
          test_inputs(b'^', b'a', [b'a'])
          assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?


          Incidentally, you got very lucky that out of ], ^, and -, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3 is a magic number in this respect — or should you maybe document it for posterity?





          I ran my test_inputs on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So test_inputs(b'\', b'A', ) fails.



          Fix the trivial bug by adding two characters in escape_byte_in_character_class. Now add a regression test: compare_escaped_within_range(b'\'). The test still fails! Why? :)



          And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A']) fails.



          And you don't handle test_inputs(b'B', b'A', ) either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).' in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."



          The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)






          share|improve this answer





















          • Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
            – Graham
            2 days ago








          • 1




            I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
            – Graham
            2 days ago










          • +1 for bytes([i]). :)
            – Quuxplusone
            2 days ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210765%2fregex-character-class-range-generator-with-certain-characters-excluded%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          offset = re_supp._offset_byte_char

          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp (namely the _offset_byte_char method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it. It would at least be more maintainable to say



          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          offset = re_supp._offset_byte_char
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          But it would be even better not to depend on that detail at all! Either make _offset_byte_char a public member of the API (lose the underscore), or just reimplement it yourself where you need it.



          When you reimplement it, consider that Python offers the chr builtin to do exactly what you're currently doing with all that byteorder='big' nonsense on ints. Although I guess Python3 does force you to know about latin_1 encoding:



          offset = lambda x, y: chr(ord(x) + y).encode('latin_1')




          IIUC, your public API really just consists of the function byte_range(b, e, xs), where b and e are single bytes and xs is an iterable of bytes — which generates a regex character class matching any of the bytes in b to e inclusive, but excluding all of the bytes in xs.



          In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs])., or more succinctly (?![xs])[b-e].



          I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)





          Personally I would write ch everywhere you wrote char, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch would still be shorter and arguably harder to confuse with chr, though...





          Again on the testing side: your compare_escaped_within_range seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:



          def test_inputs(b, e, xs):
          needle = re_supp.byte_range(b, e, xs)
          for i in range(256):
          haystack = chr(i).encode('latin_1')
          expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
          actual = bool(re.match(needle, haystack))
          assert actual == expected


          And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs) passed for their inputs. Even if they went on to verify other properties.



          def test_byte_regex_range_escaped_caret():
          """Test that '^' is escaped"""
          test_inputs(b'^', b'a', [b'a'])
          assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?


          Incidentally, you got very lucky that out of ], ^, and -, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3 is a magic number in this respect — or should you maybe document it for posterity?





          I ran my test_inputs on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So test_inputs(b'\', b'A', ) fails.



          Fix the trivial bug by adding two characters in escape_byte_in_character_class. Now add a regression test: compare_escaped_within_range(b'\'). The test still fails! Why? :)



          And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A']) fails.



          And you don't handle test_inputs(b'B', b'A', ) either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).' in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."



          The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)






          share|improve this answer





















          • Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
            – Graham
            2 days ago








          • 1




            I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
            – Graham
            2 days ago










          • +1 for bytes([i]). :)
            – Quuxplusone
            2 days ago
















          4














          offset = re_supp._offset_byte_char

          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp (namely the _offset_byte_char method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it. It would at least be more maintainable to say



          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          offset = re_supp._offset_byte_char
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          But it would be even better not to depend on that detail at all! Either make _offset_byte_char a public member of the API (lose the underscore), or just reimplement it yourself where you need it.



          When you reimplement it, consider that Python offers the chr builtin to do exactly what you're currently doing with all that byteorder='big' nonsense on ints. Although I guess Python3 does force you to know about latin_1 encoding:



          offset = lambda x, y: chr(ord(x) + y).encode('latin_1')




          IIUC, your public API really just consists of the function byte_range(b, e, xs), where b and e are single bytes and xs is an iterable of bytes — which generates a regex character class matching any of the bytes in b to e inclusive, but excluding all of the bytes in xs.



          In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs])., or more succinctly (?![xs])[b-e].



          I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)





          Personally I would write ch everywhere you wrote char, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch would still be shorter and arguably harder to confuse with chr, though...





          Again on the testing side: your compare_escaped_within_range seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:



          def test_inputs(b, e, xs):
          needle = re_supp.byte_range(b, e, xs)
          for i in range(256):
          haystack = chr(i).encode('latin_1')
          expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
          actual = bool(re.match(needle, haystack))
          assert actual == expected


          And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs) passed for their inputs. Even if they went on to verify other properties.



          def test_byte_regex_range_escaped_caret():
          """Test that '^' is escaped"""
          test_inputs(b'^', b'a', [b'a'])
          assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?


          Incidentally, you got very lucky that out of ], ^, and -, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3 is a magic number in this respect — or should you maybe document it for posterity?





          I ran my test_inputs on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So test_inputs(b'\', b'A', ) fails.



          Fix the trivial bug by adding two characters in escape_byte_in_character_class. Now add a regression test: compare_escaped_within_range(b'\'). The test still fails! Why? :)



          And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A']) fails.



          And you don't handle test_inputs(b'B', b'A', ) either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).' in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."



          The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)






          share|improve this answer





















          • Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
            – Graham
            2 days ago








          • 1




            I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
            – Graham
            2 days ago










          • +1 for bytes([i]). :)
            – Quuxplusone
            2 days ago














          4












          4








          4






          offset = re_supp._offset_byte_char

          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp (namely the _offset_byte_char method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it. It would at least be more maintainable to say



          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          offset = re_supp._offset_byte_char
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          But it would be even better not to depend on that detail at all! Either make _offset_byte_char a public member of the API (lose the underscore), or just reimplement it yourself where you need it.



          When you reimplement it, consider that Python offers the chr builtin to do exactly what you're currently doing with all that byteorder='big' nonsense on ints. Although I guess Python3 does force you to know about latin_1 encoding:



          offset = lambda x, y: chr(ord(x) + y).encode('latin_1')




          IIUC, your public API really just consists of the function byte_range(b, e, xs), where b and e are single bytes and xs is an iterable of bytes — which generates a regex character class matching any of the bytes in b to e inclusive, but excluding all of the bytes in xs.



          In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs])., or more succinctly (?![xs])[b-e].



          I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)





          Personally I would write ch everywhere you wrote char, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch would still be shorter and arguably harder to confuse with chr, though...





          Again on the testing side: your compare_escaped_within_range seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:



          def test_inputs(b, e, xs):
          needle = re_supp.byte_range(b, e, xs)
          for i in range(256):
          haystack = chr(i).encode('latin_1')
          expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
          actual = bool(re.match(needle, haystack))
          assert actual == expected


          And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs) passed for their inputs. Even if they went on to verify other properties.



          def test_byte_regex_range_escaped_caret():
          """Test that '^' is escaped"""
          test_inputs(b'^', b'a', [b'a'])
          assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?


          Incidentally, you got very lucky that out of ], ^, and -, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3 is a magic number in this respect — or should you maybe document it for posterity?





          I ran my test_inputs on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So test_inputs(b'\', b'A', ) fails.



          Fix the trivial bug by adding two characters in escape_byte_in_character_class. Now add a regression test: compare_escaped_within_range(b'\'). The test still fails! Why? :)



          And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A']) fails.



          And you don't handle test_inputs(b'B', b'A', ) either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).' in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."



          The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)






          share|improve this answer












          offset = re_supp._offset_byte_char

          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          Two weird things going on here. First, your unit test depends on an internal implementation detail of re_supp (namely the _offset_byte_char method). Second, you're actually storing that method in a global variable, which makes your test not standalone. You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it. It would at least be more maintainable to say



          def compare_escaped_within_range(char):
          """Test that the character is escaped at the beginning of a range."""
          offset = re_supp._offset_byte_char
          end = offset(char, 3)
          assert re_supp.byte_range(char, end, [end]) == b'[\' + char + b'-' + offset(end, -1) + b']'


          But it would be even better not to depend on that detail at all! Either make _offset_byte_char a public member of the API (lose the underscore), or just reimplement it yourself where you need it.



          When you reimplement it, consider that Python offers the chr builtin to do exactly what you're currently doing with all that byteorder='big' nonsense on ints. Although I guess Python3 does force you to know about latin_1 encoding:



          offset = lambda x, y: chr(ord(x) + y).encode('latin_1')




          IIUC, your public API really just consists of the function byte_range(b, e, xs), where b and e are single bytes and xs is an iterable of bytes — which generates a regex character class matching any of the bytes in b to e inclusive, but excluding all of the bytes in xs.



          In other words, it produces a regex which is the moral equivalent of (?=[b-e])(?![xs])., or more succinctly (?![xs])[b-e].



          I guess you knew this, but it does feel like your 60-line thing is a bit overengineered. ;)





          Personally I would write ch everywhere you wrote char, but that's probably partly due to my being primarily a C and C++ programmer. ;) ch would still be shorter and arguably harder to confuse with chr, though...





          Again on the testing side: your compare_escaped_within_range seems overdetermined and overcomplicated. If I were writing tests for your API, I would write one like this:



          def test_inputs(b, e, xs):
          needle = re_supp.byte_range(b, e, xs)
          for i in range(256):
          haystack = chr(i).encode('latin_1')
          expected = (ord(b) <= i <= ord(e)) and (haystack not in xs)
          actual = bool(re.match(needle, haystack))
          assert actual == expected


          And then I'd make sure all of my tests at least verified that test_inputs(b, e, xs) passed for their inputs. Even if they went on to verify other properties.



          def test_byte_regex_range_escaped_caret():
          """Test that '^' is escaped"""
          test_inputs(b'^', b'a', [b'a'])
          assert re_supp.byte_range(b'^', b'a', [b'a']) == b'[\^-`]') # should we even verify this, or are we micromanaging?


          Incidentally, you got very lucky that out of ], ^, and -, no two of them are exactly 2 positions apart in the ASCII table! Did you know that? Do you expect your maintainer to know that 3 is a magic number in this respect — or should you maybe document it for posterity?





          I ran my test_inputs on some random data and discovered some bugs in your code. Most blatantly, you forgot that needs to be escaped. So test_inputs(b'\', b'A', ) fails.



          Fix the trivial bug by adding two characters in escape_byte_in_character_class. Now add a regression test: compare_escaped_within_range(b'\'). The test still fails! Why? :)



          And you don't handle the completely empty case: test_inputs(b'A', b'A', [b'A']) fails.



          And you don't handle test_inputs(b'B', b'A', ) either, but that's more of an input validation issue, which is an issue of taste. Maybe you want to return b'(?!.).' in that case; maybe you want to assert-fail; or maybe you're happy just saying "don't do that."



          The moral of the story is that you should always write a test harness that permits fuzzing! It's never wasted effort. ...Well, it's never effort that fails to find bugs, anyway. :)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 3 at 5:12









          Quuxplusone

          11.5k11959




          11.5k11959












          • Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
            – Graham
            2 days ago








          • 1




            I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
            – Graham
            2 days ago










          • +1 for bytes([i]). :)
            – Quuxplusone
            2 days ago


















          • Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
            – Graham
            2 days ago








          • 1




            I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
            – Graham
            2 days ago










          • +1 for bytes([i]). :)
            – Quuxplusone
            2 days ago
















          Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
          – Graham
          2 days ago






          Very useful answer all around! Regarding switching to chr().encode('latin1'), I feel like that's relying on an implementation detail of bytestrings that latin1 happens to map to bytestrings perfectly. I guess that's unlikely to change, but I still feel it makes the code less clear. I think Python just has a rough edge by always requiring byteorder in to_bytes. For 1 byte conversions, there are no bytes to order, so it doesn't have relevance! And yes, my solution is admittedly over-engineered, but it does have some benefits, which I mentioned in the last paragraph of my question.
          – Graham
          2 days ago






          1




          1




          I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
          – Graham
          2 days ago




          I could have also done bytes([i]) for the int to bytestring conversion, which might be the most succinct and best solution. I'm not arguing that what I've done with offset is good style (it's clearly bad style), but I'm not certain what you mean when say "You couldn't extract just the compare_escaped_within_range test into a different test file, for example, unless you brought the global offset along with it." If you extracted compare_escaped_within_range, it would keep the reference to the global offset within the original module, unless I misunderstand what you mean by "extract".
          – Graham
          2 days ago












          +1 for bytes([i]). :)
          – Quuxplusone
          2 days ago




          +1 for bytes([i]). :)
          – Quuxplusone
          2 days ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210765%2fregex-character-class-range-generator-with-certain-characters-excluded%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Сан-Квентин

          8-я гвардейская общевойсковая армия

          Алькесар