You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

9.0 KiB

Raw Blame History Unescape Escape

1. 正则表达式函数
2. 正则表达式规则

PHP提供了丰富的正则表达式函数，主要用于模式匹配、替换、分割等操作。以下是核心函数及其功能详解：

1. 正则表达式函数

1.1. 匹配类函数

preg_match() 执行单次正则匹配，返回匹配次数（0或1）。支持捕获组存储到$matches数组。示例：匹配字符串中的数字
```
php$text = "Order 12345";
preg_match('/Order \d+/', $text, $matches); // $matches[0] = "Order 12345"
```

preg_match_all() 全局匹配，返回所有匹配结果到$matches数组。可通过flags参数（如PREG_PATTERN_ORDER）调整结果排序。示例：提取HTML中的链接

$html = '<a href="link1.html">Link1</a><a href="link2.html">Link2</a>';
preg_match_all('/<a href="(.*?)">(.*?)<\/a>/', $html, $matches);
// $matches[1] = ["link1.html", "link2.html"], $matches[2] = ["Link1", "Link2"]

1.2. 替换类函数

preg_replace() 执行正则替换，支持逆向引用（如$1）和数组替换。示例：脱敏手机号

$text = "用户电话:123-4567-8901";
$masked = preg_replace('/(\d{3})-(\d{4})-\d{4}/', '$1-****-$2', $text);
// 输出 "用户电话:123-****-4567"

preg_replace_callback() 通过回调函数动态生成替换内容，灵活性更高。示例：年份递增

$text = "April fools day is 04/01/2002";
echo preg_replace_callback(
    "|(\d{2}/\d{2}/)(\d{4})|",
    function($m) { return $m[1] . ($m[2] + 1); },
    $text
); // 输出 "04/01/2003"

preg_filter() 类似preg_replace，但仅保留有替换结果的条目（忽略未匹配项）。

1.3. 分割与筛选

preg_split() 按正则规则分割字符串，返回数组。支持limit限制分割次数，flags调整结果格式（如PREG_SPLIT_NO_EMPTY过滤空项）。示例：分割CSV
```
$csv = "apple, orange; banana|pear";
$fruits = preg_split('/[,;|]/', $csv); // ["apple", "orange", "banana", "pear"]
```

preg_grep() 筛选数组中匹配模式的元素。示例：筛选有效邮箱

$emails = ["test@qq.com", "invalid-email", "user@example.com"];
$valid = preg_grep('/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/', $emails);
// 输出 ["test@qq.com", "user@example.com"]

2. 正则表达式规则

PHP 的正则表达式基于 PCRE（Perl Compatible Regular Expressions） 库，其匹配规则与 Perl 正则高度兼容。以下是 PHP 正则表达式的核心匹配规则，涵盖字符匹配、量词、边界控制、分组等关键功能：

2.1. 基础字符匹配

普通字符：直接匹配对应字符（如 a 匹配字母 a）。
元字符：需转义后匹配（如 \. 匹配点号 .，\* 匹配星号 *）。
```
preg_match('/\.com/', 'example.com'); // 匹配 ".com"
```
特殊字符类：
- \d：匹配数字（等价于 [0-9]）。
- \w：匹配单词字符（字母、数字、下划线，等价于 [a-zA-Z0-9_]）。
- \s：匹配空白字符（空格、制表符、换行等）。
- 反义类：\D（非数字）、\W（非单词字符）、\S（非空白）。

2.2. 量词（控制匹配次数）

量词	含义	示例
`.`	匹配单个任意字符（除换行符）	`a.b` 匹配 `aXb`、`a b` 等
`*`	匹配 0 次或多次	`a*` 匹配 `""`, `a`, `aa`
`+`	匹配 1 次或多次	`a+` 匹配 `a`, `aa`
`?`	匹配 0 次或 1 次	`a?` 匹配 `""` 或 `a`
`{n}`	精确匹配 n 次	`a{3}` 匹配 `aaa`
`{n,}`	匹配至少 n 次	`a{2,}` 匹配 `aa`, `aaa`
`{n,m}`	匹配 n 到 m 次	`a{2,4}` 匹配 `aa`, `aaa`, `aaaa`

贪婪 vs 非贪婪：

默认贪婪模式（匹配尽可能多的字符），如 .* 会匹配整个字符串。

非贪婪模式（在量词后加 ?），如 .*? 匹配最短可能。

$text = "<div>content</div>";
preg_match('/<div>(.*?)<\/div>/', $text, $matches); // $matches[1] = "content"

2.3. 边界控制

锚点：
- ^：匹配字符串开头（多行模式下匹配行首）。
- $：匹配字符串结尾（多行模式下匹配行尾）。
```
preg_match('/^Hello/', 'Hello world'); // 匹配成功
preg_match('/world$/', 'Hello world'); // 匹配成功
```
单词边界：
- \b：匹配单词边界（如 \bword\b 匹配独立单词 word）。
- \B：匹配非单词边界；可以认为是 \b的反义。
```
preg_match('/\bcat\b/', 'The cat sat'); // 匹配 "cat"
preg_match('/\bcat\b/', 'category');    // 不匹配
```

2.4. 分组与引用

捕获组：用 ( ) 包裹，匹配内容会被捕获到 $matches 数组。

preg_match('/(\d{4})-(\d{2})/', '2023-05', $matches);
// $matches[0] = "2023-05", $matches[1] = "2023", $matches[2] = "05"

非捕获组：用 (?: ) 包裹，不捕获内容（提升性能）。

preg_match('/(?:\d{3})-(\d{4})/', '123-4567', $matches);
// $matches[1] = "4567"（仅捕获第二组）

反向引用：用 \n（n 为组号）引用已捕获的内容。

preg_match('/(\w+)\s\1/', 'test test', $matches); // 匹配重复单词

2.5. 替代与条件匹配

分支匹配：用 | 表示“或”关系。

preg_match('/(cat|dog)/', 'I have a cat'); // 匹配 "cat"

条件匹配：PCRE 支持 (?(condition)yes|no) 语法（较少用）。

2.6. 模式修饰符

修饰符放在定界符后（如 /pattern/i），常用如下：

修饰符	含义	示例
`i`	不区分大小写	`/hello/i` 匹配 `Hello`
`m`	多行模式（`^` 和 `$` 匹配行首/尾）	`/^line/m` 匹配每行开头
`s`	让 `.` 匹配换行符	`/.*./s` 匹配跨行文本
`u`	支持 UTF-8 编码	`/\p{L}/u` 匹配 Unicode 字母
`x`	忽略模式中的空白和注释	`/a b /x` 匹配 `ab`
`e`	匹配后的字符串作为php执行	新版本已经取消

2.7. 特殊结构

环视（Lookaround）：
- 正向环视：(?=pattern)（后面必须跟 pattern），(?!pattern)（后面不能跟 pattern）。
- 反向环视：(?<=pattern)（前面必须是 pattern），(?<!pattern)（前面不能是 pattern）。
```
preg_match('/(?<=\d)abc/', '123abc'); // 匹配 "abc"（前面是数字）
```
原子组：(?>pattern) 防止回溯（性能优化）。

2.8. Unicode 支持

使用 \p{L} 匹配任意字母，\p{N} 匹配数字，需配合 u 修饰符。后一个/后面可以有模式修饰符i，m，s，u，x等。
```
preg_match('/\p{L}+/u', '中文'); // 匹配 "中文"
```

2.9. 示例

模式匹配字符串的开始和结束一般是/（PHP可以有其他的替代，具体请参考手册）；

// 匹配邮箱
$email = "user@example.com";
if (preg_match('/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/', $email)) {
    echo "Valid email";
}

// 提取日期（YYYY-MM-DD）
$text = "Today is 2023-05-20";
preg_match('/\b(\d{4})-(\d{2})-(\d{2})\b/', $text, $matches);
// $matches[1] = "2023", $matches[2] = "05", $matches[3] = "20"

2.10. 注意事项

转义字符：在双引号字符串中，\ 可能被转义，建议用单引号或额外转义。
性能：避免过度使用嵌套分组或回溯复杂的模式。
错误处理：检查 preg_last_error() 调试复杂正则。

PHP 正则表达式功能强大，合理使用可高效处理字符串匹配、验证和提取任务。

9.0 KiB Raw Blame History Unescape Escape