method

new

v1_8_7_330 - Show latest stable - Class: Regexp

new(...)

public

Constructs a new regular expression from pattern, which can be either a String or a Regexp (in which case that regexp’s options are propagated, and new options may not be specified (a change as of Ruby 1.8). If options is a Fixnum, it should be one or more of the constants Regexp::EXTENDED, Regexp::IGNORECASE, and Regexp::MULTILINE, or-ed together. Otherwise, if options is not nil, the regexp will be case insensitive. The lang parameter enables multibyte support for the regexp: `n’, `N’ = none, `e’, `E’ = EUC, `s’, `S’ = SJIS, `u’, `U’ = UTF-8.

r1 = Regexp.new('^a-z+:\\s+\w+')           #=> /^a-z+:\s+\w+/
r2 = Regexp.new('cat', true)               #=> /cat/i
r3 = Regexp.new('dog', Regexp::EXTENDED)   #=> /dog/x
r4 = Regexp.new(r2)                        #=> /cat/i

/*
 *  call-seq:
 *     Regexp.new(string [, options [, lang]])       => regexp
 *     Regexp.new(regexp)                            => regexp
 *     Regexp.compile(string [, options [, lang]])   => regexp
 *     Regexp.compile(regexp)                        => regexp
 *  
 *  Constructs a new regular expression from <i>pattern</i>, which can be either
 *  a <code>String</code> or a <code>Regexp</code> (in which case that regexp's
 *  options are propagated, and new options may not be specified (a change as of
 *  Ruby 1.8). If <i>options</i> is a <code>Fixnum</code>, it should be one or
 *  more of the constants <code>Regexp::EXTENDED</code>,
 *  <code>Regexp::IGNORECASE</code>, and <code>Regexp::MULTILINE</code>,
 *  <em>or</em>-ed together. Otherwise, if <i>options</i> is not
 *  <code>nil</code>, the regexp will be case insensitive. The <i>lang</i>
 *  parameter enables multibyte support for the regexp: `n', `N' = none, `e',
 *  `E' = EUC, `s', `S' = SJIS, `u', `U' = UTF-8.
 * 
 *     r1 = Regexp.new('^a-z+:\\s+\w+')           #=> /^a-z+:\s+\w+/
 *     r2 = Regexp.new('cat', true)               #=> /cat/i
 *     r3 = Regexp.new('dog', Regexp::EXTENDED)   #=> /dog/x
 *     r4 = Regexp.new(r2)                        #=> /cat/i
 */

static VALUE
rb_reg_initialize_m(argc, argv, self)
    int argc;
    VALUE *argv;
    VALUE self;
{
    const char *s;
    long len;
    int flags = 0;

    if (argc == 0 || argc > 3) {
        rb_raise(rb_eArgError, "wrong number of arguments");
    }
    if (TYPE(argv[0]) == T_REGEXP) {
        if (argc > 1) {
            rb_warn("flags%s ignored", (argc == 3) ? " and encoding": "");
        }
        rb_reg_check(argv[0]);
        flags = RREGEXP(argv[0])->ptr->options & 0xf;
        if (FL_TEST(argv[0], KCODE_FIXED)) {
            switch (RBASIC(argv[0])->flags & KCODE_MASK) {
              case KCODE_NONE:
                flags |= 16;
                break;
              case KCODE_EUC:
                flags |= 32;
                break;
              case KCODE_SJIS:
                flags |= 48;
                break;
              case KCODE_UTF8:
                flags |= 64;
                break;
              default:
                break;
            }
        }
        s = RREGEXP(argv[0])->str;
        len = RREGEXP(argv[0])->len;
    }
    else {
        if (argc >= 2) {
            if (FIXNUM_P(argv[1])) flags = FIX2INT(argv[1]);
            else if (RTEST(argv[1])) flags = RE_OPTION_IGNORECASE;
        }
        if (argc == 3 && !NIL_P(argv[2])) {
            char *kcode = StringValuePtr(argv[2]);

            flags &= ~0x70;
            switch (kcode[0]) {
              case 'n': case 'N':
                flags |= 16;
                break;
              case 'e': case 'E':
                flags |= 32;
                break;
              case 's': case 'S':
                flags |= 48;
                break;
              case 'u': case 'U':
                flags |= 64;
                break;
              default:
                break;
            }
        }
        s = StringValuePtr(argv[0]);
        len = RSTRING(argv[0])->len;
    }
    rb_reg_initialize(self, s, len, flags);
    return self;
}

2Notes

Multiline regexps

mutru · Feb 4, 20093 thanks

A shortcut for multiline regular expressions is

/First line.*Other line/m

(notice the trailing /m)

For example:

text = <<-END
Hello world!
This is a test.
END

text.match(/world.*test/m).nil?  #=> false
text.match(/world.*test/).nil?   #=> true

Other regular-expression modifiers

Soleone · Feb 11, 20093 thanks

Likewise you can set Regexp::IGNORECASE directly on the regexp with the literal syntax:

/first/i
# This will match "first", "First" and even "fiRSt"

Even more modifiers

o -- Perform #{} interpolations only once, the first time the regexp literal is evaluated.
x -- Ignores whitespace and allows comments in * regular expressions
u, e, s, n -- Interpret the regexp as Unicode (UTF-8), EUC, SJIS, or ASCII. If none of these modifiers is specified, the regular expression is assumed to use the source encoding.

=== Literal to the rescue Like string literals delimited with %Q, Ruby allows you to begin your regular expressions with %r followed by a delimiter of your choice.

This is useful when the pattern you are describing contains a lot of forward slash characters that you don't want to escape:

%Q(http://)
# This will match "http://"

new

2Notes

Multiline regexps

Other regular-expression modifiers

Related methods