No rexen for the wildcard

(idea) by ariels Mon Mar 27 2000 at 8:33:19

The oft-heard lament of those using wildcard expansion on a shell, when they suddenly discover that while the shell's wildcards may be easy to type, they make doing some things impossible which would be easy with real regular expressions.

For example, all of these are easy:

  • "*.txt" matches everything ending in ".txt"; this is the same as the regular expression (or regexp) "\.txt$".
  • "readme*.txt" matches everything beginning "readme" and ending ".txt", including "readme.txt" (if such exists); this is the same as the regexp "^readme.*\.txt$".
  • "*.[0-9][0-9][0-9]" matches everything with a 3-digit extension at the end; this is the same as the regexp "\.[0-9]{3}$" (match anything ending in a "." followed by 3 matches of a digit 0-9).

But all of these are hard (and, if we try to match strings of unbounded length, impossible):

  • "^ab*c$" matches all strings starting with an a, ending with c, and only b's in between.
  • "^readme[1-9][0-9]*\.txt$" matches everything of the form "readme17.txt", where "17" may be substituted by any number; "^readme[0-9]+\.txt" would also allow these numbers to start with the digit "0".
  • "^[A-Z]+[a-z][0-9]$" matches everything consisting of uppercase letters followed by a lowercase letter followed by a digit.

(idea) by rp Fri Jul 14 2000 at 11:55:06
An earlier stage: utter confusion from users who, familiar with wildcards, aka filename globbing, encounter regular expressions for the first time and don't realize * and ? are operators rather than wildcards.

This is one of the things that makes regular expressions 'hard': it's unexpected. But it's not the notation that's complex, it's what you can express with it: the complexity of regular expressions is appropriate.

Y'know, if you log in, you can write something here, or contact authors directly on the site. Create a New User if you don't already have an account.