This document was prepared to propose that RegExp of ECMAScript should support fixed-length lookbehind assertions as in Perl 5. However, as TC39 prefers variable length lookbehind assertions as in .NET, this document is likely to be superseded by another specification document for supporting variable length lookbehind assertions. (Samples of specifications for variable length lookbehinds support: Claude Pache's version, my Compact Version and my Lengthy Version).
Basically, the evaluation for a lookbehind assertion is performed by "rewind the target sequence and do the same thing as a lookahead assertion":
The production Assertion :: ( ? < = Disjunction ) evaluates as follows:
The production Assertion :: ( ? < ! Disjunction ) evaluates as follows:
End of Proposal
This section is not a part of the proposal, but written just for clarification.
NOTE "Input is a List consisting of all of the characters, in order, of the String being matched by the regular expression pattern. Each character is either a code unit or a code point, depending upon the kind of pattern involved" (21.2.2.1). Thus, for example,
/(?<=a.)bc/.exec("a𝄞bc"); // 𝄞 is U+1D11E, MUSICAL SYMBOL G CLEF.
returns null. Given a BMP pattern, one character is one code unit. In this case, therefore, the positive lookbehind rewinds the target sequence by two code units and Atom :: . in the assertion does not match the whole surrogate pair that represents the character U+1D11E; the lookbehind (?<=a.)
matches only the sequence that consists of "a"
and the first half of the surrogate pair. However,
/(?<=a.)bc/u.exec("a𝄞bc")
returns "bc"
. Given a Unicode pattern, one character is one code point. In this case, the positive lookbehind rewinds the target sequence by two code points and Atom :: . in the assertion can match the whole surrogate pair that represents the character U+1D11E; thus the lookbehind (?<=a.)
matches the sequence "a𝄞"
.