module Parser:sig
..end
'a t
is a regex that parses 'a
s.
The matching is implemented using Re2.
UTF-8 is supported by Re2 but not by this module. This is because we want to use
char
as a character type, but that's just wrong in a multibyte encoding.
type 'a
t
'a t
is a regex that parses 'a
s.
The matching is implemented using Re2.
UTF-8 is supported by Re2 but not by this module. This is because we want to use
char
as a character type, but that's just wrong in a multibyte encoding.
val compile : ?case_sensitive:bool ->
'a t -> (string -> 'a option) Core_kernel.Std.Staged.t
case_sensitive
defaults to true
.
compile
, run
, and matches
suffer from Re2's limitations with regards to null
bytes in the input: they are considered to end the string.
val run : ?case_sensitive:bool -> 'a t -> string -> 'a option
val matches : ?case_sensitive:bool -> 'a t -> string -> bool
val to_regex_string : 'a t -> string
to_regex_string
and to_re2
both forget what a 'a t
knows
about turning the matching strings into 'a
sval to_re2 : ?case_sensitive:bool -> 'a t -> Re2_internal.t
include Applicative.S
both a b
is a regex that
parses an a
followed by a b
and returns both results in a pair.val of_re2 : Re2_internal.t -> string option array t
of_re2 r
forgets the options that r
was compiled with, instead using
`Encoding_latin1 true
, `Dot_nl true
, and the case-sensitivity setting of the
overall pattern. You can still try and use '(?flags:re)' Re2 syntax to set options for
the scope of this regex.
The returned values are precisely the captures of the underlying regex, in order: note
that unlike (say) Re2.Match.get_all
, the whole match is *not* included (if you want
that, just use capture
). Named captures are not accessible by name.
val ignore : 'a t -> unit t
ignore t
is a regex which matches the same strings that t
matches, but doesn't
call functions on the captured submatches. Particularly, something like
ignore (map (string "x") ~f:Int.of_string)
won't raise an exception, because the int
conversion is never attempted.val capture : unit t -> string t
capture t
returns the string matched by t
val and_capture : 'a t -> ('a * string) t
and_capture t
returns the string matched by t
in addition to whatever it was
already going to return.val fail : 'a t
val or_ : 'a t list -> 'a t
val optional : ?greedy:bool -> 'a t -> 'a option t
greedy
defaults to true. If false, the regexp will prefer not matching.val repeat : ?greedy:bool -> ?min:int -> ?max:int option -> unit t -> unit t
repeat ~min ~max t
constructs the regex t{min,max}
. min
defaults to 0
and
max
defaults to None
(unbounded), so that just plain repeat t
is equivalent
to t*
.
It would be better for repeat
to be 'a t -> 'a list t
, but the re2 library doesn't
give you access to repeated submatches like that. Hence, repeat
ignores all
submatches of its argument and does not call any callbacks that may have been
attached to them, as if it had ignore
called on its result.
val times : unit t -> int -> unit t
times r n
essentially constructs the regex r{n}. It is equivalent to
repeat ~min:n ~max:(Some n) r
.
Compare with, say, all (List.init n ~f:(fun _ -> r))
, which constructs the regex
rrr...r (with n copies of r) and has the type 'a t -> 'a list t
.
val string : string -> unit t
string
, Char.one_of
, and Char.not_one_of
raise exceptions in the presence of
null bytesval start_of_input : unit t
val end_of_input : unit t
module Char:sig
..end
module Decimal:sig
..end
val sexp_of_t : ('a -> Sexplib.Sexp.t) -> 'a t -> Sexplib.Sexp.t
case_sensitive
defaults to true
.
compile
, run
, and matches
suffer from Re2's limitations with regards to null
bytes in the input: they are considered to end the string.
to_regex_string
and to_re2
both forget what a 'a t
knows
about turning the matching strings into 'a
s
The applicative interface provides sequencing, e.g. both a b
is a regex that
parses an a
followed by a b
and returns both results in a pair.
of_re2 r
forgets the options that r
was compiled with, instead using
`Encoding_latin1 true
, `Dot_nl true
, and the case-sensitivity setting of the
overall pattern. You can still try and use '(?flags:re)' Re2 syntax to set options for
the scope of this regex.
The returned values are precisely the captures of the underlying regex, in order: note
that unlike (say) Re2.Match.get_all
, the whole match is *not* included (if you want
that, just use capture
). Named captures are not accessible by name.
ignore t
is a regex which matches the same strings that t
matches, but doesn't
call functions on the captured submatches. Particularly, something like
ignore (map (string "x") ~f:Int.of_string)
won't raise an exception, because the int
conversion is never attempted.
capture t
returns the string matched by t
and_capture t
returns the string matched by t
in addition to whatever it was
already going to return.
Regex that matches nothing
greedy
defaults to true. If false, the regexp will prefer not matching.
repeat ~min ~max t
constructs the regex t{min,max}
. min
defaults to 0
and
max
defaults to None
(unbounded), so that just plain repeat t
is equivalent
to t*
.
It would be better for repeat
to be 'a t -> 'a list t
, but the re2 library doesn't
give you access to repeated submatches like that. Hence, repeat
ignores all
submatches of its argument and does not call any callbacks that may have been
attached to them, as if it had ignore
called on its result.
times r n
essentially constructs the regex r{n}. It is equivalent to
repeat ~min:n ~max:(Some n) r
.
Compare with, say, all (List.init n ~f:(fun _ -> r))
, which constructs the regex
rrr...r (with n copies of r) and has the type 'a t -> 'a list t
.
string
, Char.one_of
, and Char.not_one_of
raise exceptions in the presence of
null bytes
Matches empty string at the beginning of the text
Matches empty string at the end of the text
any
, unlike "." by default, matches newline.
(However, note that of_re2 (Re2.create_exn ".")
will match newline. See the
comment on of_re2
for more information.)
Duplicates in the lists given to one_of
and not_one_of
are ignored.
The following 6 values match the Re2 character classes with the same name.
A character matching Char.is_uppercase
A character matching Char.is_lowercase
A character matching Char.is_alpha
A character matching Char.is_digit
A character matching Char.is_alphanum
A character matching Char.is_whitespace
optional sign symbol:
"+" or "" mean 1
"-" means -1