Episode 2: Real-world regular expressions

Let’s get this out there right off the bat: I love regular expressions. Really, I do — they’re the Swiss Army Knife of text processing, and no respecting developer can go long without needing ’em.

Of course, we all also know how dangerous they can be. As always, with great power comes great responsibility.

Still, if you know how — and when — and why — to use regular expressions, they’re indispensable. So this week, regular expressions will be our theme.

Below are five regular expressions. Each one of them matches a real-world string; that is, a semi-structured piece of text you might want to pull out of a greater document. Here’s an example question to give you an idea what I mean:

  1. [0-9]{5}

This, of course, is a US ZIP code.

So, what “things” do these regular expressions match? We’ll assume for this quiz that the regex engine is running in case-insensitive mode:

  1. [A-PR-Y0-9]{3}-[A-PR-Y0-9]{3}-[A-PR-Y0-9]{4}
  2. &(?!(w+|#d+);)
  3. (-?(?:0|[1-9]d*))(.d+)?([eE][-+]?d+)?
  4. ([da-f]{2}:){5}([da-f]{2})
  5. <[^>]*?>

Of course, since we’re dealing with regular expressions here, I’d be amiss if I didn’t give you two problems for the price of one.

In each case, the regular expression has something wrong with it. For example, the ZIP code regex above doesn’t correctly match the ZIP+4 format (i.e. 66044-0034) that’s used for many addresses these days.

So, for part two, what’s wrong with the rest of ’em?

Enjoy your Thanksgiving belly-stuffing, and tune in over the weekend for the answers.

Replay

Category: community Time: 2006-11-22 Views: 0
Tags:

Related post

  • Answers to Episode 2 (Real-life regular expressions) 2006-11-28

    Yeah, I'm a little late getting these answers posted. Sorry! If you missed it, last week's challenge dealt with deciphering regular expressions and finding subtle bugs within 'em. As with last week, before getting to the actual answers please indulge

  • Extract single substring from each row of the series using regular expression with named capturing groups in the alternation operator 2016-01-30

    Given: Pandas series src of strings; Complex regular expression (for simplicity let '^(?:\d+ (\w+)|(\w+) \d+)$') that can extract some single substring (let each string matches regex). The goal: get pandas series (i.e. "column") that has extract

  • How to expose securely a Node.js/Express server into the real world? 2013-11-03

    Essentially I would like to know what the title suggests. Node.js/Express is nice. However, node is a fairly recent thing and hence there may be security risks by exposing the server to the real world. So, my question really boils down to what sort o

  • The Joy of Regular Expressions [2] 2006-09-27

    So continuing the fun started here- Contents Part 2 Where we've been so far- Hunting for .jp(e)gs Escaping Meta Characters Search and Replace preg_quote() preg_replace() Word Boundaries, Word Characters- and everything else Sub-patterns Spot the XSS

  • Tokenization using regular expression sub patterns 2008-01-19

    A while back was writing some stuff on this blog about regular expressoins. While that remains unfinished, a mini regex example – nothing earth shattering but a useful technique if you hadn't already seen it. Prompted by a real world example, one oft

  • Why my regular expression doesn't work? 2010-09-06

    I am trying to match a multi line text using java. When I use the Pattern class with the Pattern.MULTILINE modifier, I am able to match, but I am not able to do so with (?m). The same pattern with (?m) and using String.matches does not seem to work.

  • Would you use (a dialect of) LISP for a real-world application? Where and why? 2011-02-23

    LISP (and dialects such as Scheme, Common LISP and Clojure) haven't gained much industry support even though they are quite decent programming languages. (At the moment though it seems like they are gaining some traction). Now, this is not directly r

  • Using lookahead assertions in regular expressions 2011-06-24

    I use regular expressions on a daily basis, as my daily work is 90% in Perl (legacy codebase, but that's a different issue). Despite this, I still find lookahead and lookbehind to be terribly confusing and often unreadable. Right now, if I were to ge

  • Do real-world algorithms that greatly outperform in the class below exist? 2011-10-06

    Last night I was discussing with another programmer that even though something may be O(1), an operation which is O(n) may outperform it if there are is a large constant in the O(1) algorithm. He disagreed, so I've brought it here. Are there examples

  • Regular Expressions: How is group matching useful? 2013-06-20

    I've decided to learn some regular expression basics. I am using the Regex One lessons online and I was stuck at lession 11 for a while, but I think I got it now. This was the task. "Write a regular expression that matches only the filenames (not inc

  • What does the Jamie Zawinski's quotation about regular expressions mean? 2014-01-09

    There is a popular quote by Jamie Zawinski: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. How is this quote supposed to be understood? --------------Solutions-------------

  • Using a "strong" type system in the real world, say, for large-scale web-apps? 2015-12-20

    I know this is a very broad, ambiguous, and possibly philosophical question. To an extent, that the most important keyword in the question - "strong" type system - itself, is ill-defined. So, let me try to explain what I mean. Overall context fo

  • An Alternative to Regular Expressions: agp-exp 2016-07-20

    This article was peer reviewed by Sebastian Seitz and Almir Bijedic. Thanks to all of SitePoint's peer reviewers for making SitePoint content the best it can be! Hardly any programmer escapes the need to use regular expressions in one form or another

  • Using Regular Expressions in PHP 2002-12-23

    When I first started programming in PHP, I found regular expressions very difficult. They were complicated, looked ugly, were hard to figure out, and there seemed to be a real lack of documentation in this area. This article will provide you with an

  • Regulazy-Regular Expressions for the Rest of Us 2006-07-11

    I have been working on admitting my weaknesses lately. And one of them is I really, really suck at writing regular expressions. I don't think I have ever ginned anything more complicated than a "make sure this is three digits" expression from sc

  • The Joy of Regular Expressions [1] 2006-09-26

    Was asked recently if I knew of any good regular expressions tutorials (preferably in PHP). The question came from someone certainly smart enough to "get" regular expressions but they'd been unable to find accessible help. Most regular expressio

  • The Joy of Regular Expressions [3] 2006-09-28

    Following on from the last part, this one is more of an intermission – a round up of regex syntax seen so far and a couple of links following feedback. Part 4 is here. Reads First you have to check out Andrei's Regex Clinic (slides / pdf) – even if y

  • The Joy of Regular Expressions [4] 2006-10-28

    Having found some more joy, time to interrupt your Friday evening viewing, picking up the saga from where we left off last time. Contents Is that a date? The d meta character More sub patterns User friendlier dates The PCRE Extended Pattern Modifier

  • Learn Apache mod_rewrite: 13 Real-world Examples 2007-09-26

    This article was written in 2007 and remains one of our most popular posts. If you're keen to learn more about Apache, you may find this recent article on Apache CloudStack of great interest. Apache's low-cost, powerful set of features make it the se

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 0.567 (s). 13 q(s)