PHP搜索文件夹里面的文件 Using PHP’s glob() function to find files in a directory

Example directory

The examples below look at a directory with the following, the same example directory as used in the read through directory post:

bar.txt       A regular file
baz           A directory
foo.txt       A regular file
link2foo.txt  A symbolic link to foo.txt

Simple example

To find all the files in the directory /path/to/directory with a .txt file extension, you can do this:

$files = glob("/path/to/directory/*.txt");

The $files array contains the following from the example directory:

Array
(
    [0] => /path/to/directory/bar.txt
    [1] => /path/to/directory/foo.txt
    [2] => /path/to/directory/link2foo.txt
)

If no files matched the pattern then the array will be empty.

Example using braces

There are flags which can be passed as a second optional parameter. One of these is GLOB_BRACE which means that e.g. {jpg,gif,png} will be expanded to match jpg, gif and png which can be useful if you need to look for a particular set of files by their extension, in this example for image files.

If the example directory also had the files 1.jpg, 2.gif and 3.png then you can do this to get glob to return just the image files:

$files = glob("/path/to/directory/*.{jpg,gif,png}", GLOB_BRACE);

print_r($files) would echo:

Array
(
    [0] => /path/to/directory/1.jpg
    [1] => /path/to/directory/2.gif
    [2] => /path/to/directory/3.png
)

定义和用法

glob() 函数返回匹配指定模式的文件名或目录。

该函数返回一个包含有匹配文件 / 目录的数组。如果出错返回 false。

语法

glob(pattern,flags)

参数 描述
file 必需。规定检索模式。
size 可选。规定特殊的设定。

  • GLOB_MARK – 在每个返回的项目中加一个斜线
  • GLOB_NOSORT – 按照文件在目录中出现的原始顺序返回(不排序)
  • GLOB_NOCHECK – 如果没有文件匹配则返回用于搜索的模式
  • GLOB_NOESCAPE – 反斜线不转义元字符
  • GLOB_BRACE – 扩充 {a,b,c} 来匹配 ‘a’,’b’ 或 ‘c’
  • GLOB_ONLYDIR – 仅返回与模式匹配的目录项
  • GLOB_ERR – 停止并读取错误信息(比如说不可读的目录),默认的情况下忽略所有错误

注释:GLOB_ERR 是 PHP 5.1 添加的。

利用参数GLOB_BRACE可以进行搜索,实例:

<?php
foreach (glob("*.txt") as $filename) {
    echo "$filename size " . filesize($filename) . " ";
}
?>

获取目录下的所有子目录

<?php
function listdirs($dir) {
   static $alldirs = array();
   $dirs = glob($dir . '/*', GLOB_ONLYDIR);
   if (count($dirs) > 0) {
       foreach ($dirs as $d) $alldirs[] = $d;
   }
   foreach ($dirs as $dir) listdirs($dir);
   return $alldirs;
}
?>

匹配所有文件

<?php
$files = glob('{,.}*', GLOB_BRACE);
?>

实现兼容大小写匹配

<?php
$pattern = sql_case("*.pdf");
var_dump(glob($pattern));
?>

类似如下

<?php
foreach (array_merge(glob("*.pdf"),glob("*.PDF")) as $filename) {
     echo "$filename n";
}
?>

匹配目录下.txt后缀的文件

<?php
foreach (glob("*.txt") as $filename) {
    echo $filename;
}
?>

注意事项
1,不能作用于远程文件,被检查的文件必须通过服务器的文件系统访问。
2,使用 glob(“[myfolder]/*.txt”)将不能匹配,解决方法为 glob(“[myfolder]/*.txt”),注意[]字符应用。
3,其次是第二个参数flags有效标记说明
(1)GLOB_MARK – 在每个返回的项目中加一个斜线
(2)GLOB_NOSORT – 按照文件在目录中出现的原始顺序返回(不排序)
(3)GLOB_NOCHECK – 如果没有文件匹配则返回用于搜索的模式
(4)GLOB_NOESCAPE – 反斜线不转义元字符
(5)GLOB_BRACE – 扩充 {a,b,c} 来匹配 ‘a’,’b’ 或 ‘c’
(6)GLOB_ONLYDIR – 仅返回与模式匹配的目录项 注意: 在 PHP 4.3.3 版本之前 GLOB_ONLYDIR 在 Windows 或者其它不使用 GNU C 库的系统上不可用。
(7)GLOB_ERR – 停止并读取错误信息(比如说不可读的目录),默认的情况下忽略所有错误 注意: GLOB_ERR 是 PHP 5.1 添加的。

glob()函数的典型应用是读取数据表文件,如获取某个目录下的.sql后缀文件,这种在单元测试中非常实用,可实现读取sql文件重建数据库等,具体请参与PHP手册,请关注下一期PHP内置函数研究系列

What’s in a pattern?

Most people who have already encountered glob know to make use of the * metacharacter to match some characters, and those digging a little deeper often discover that discrete alternatives can be globbed with braces (e.g. image.{gif,jpg,png}). However, there are more special characters and sequences that can be used to be more (or less, if we want) specific about what to find.

Aside: please do not make the mistake of thinking that glob patterns are regular expressions, they’re just not. If you do want to use regular expressions to find paths/files then you are invited to use SPL’s RegexIterator, which allows filtering of an Iterator based on a PCRE regex, in conjunction with a DirectoryIterator or FilesystemIterator (there are recursive flavours of the Regex- and DirectoryIterator if you need to delve into folders). For those SPL-ly inclined, also note the [GlobIterator][globitertor] which combines the goodness of globbing with iteration. If that made entirely no sense, please read on! Globs are much less verbose.

So, here are the special doohickeys (technical term!) that we can use with glob:

* (an asterisk)
Matches zero of more characters.
?
Matches exactly any one character.
[...]
Matches one character from a group. A group can be a list of characters, e.g. [afkp], or a range of characters, e.g. [a-g] which is the same as [abcdefg].
[!...]
Matches any single character not in the group. [!a-zA-Z0-9] matches any character that is not alphanumeric.
\
Escapes the next character. For special characters, this causes them to not be treated as special. For example, \[ matches a literal [. If flags includes GLOB_NOESCAPE, this quoting is disabled and \ is handled as a simple character.

Globbingly good glob examples

Here are a few examples of what globs might look like alongside a brief description of the intended behaviour: if you have any suggestions please do make them in the comments as I’m running short on inspiration!

pattern description
*.txt Get directory contents which have the extension of .txt (Note: a file could be named simply .txt!).
?? Get directory contents with names _exactly_ two characters in length.
??* Get directory contents with names _at least_ two characters in length.
g?* Get directory contents with names at least two characters in length and starting with the letter g
*.{jpg,gif,png} Get directory contents with an extension of .jpg, .gif or .png. Remember to use the GLOB_BRACE flag.
DN?????.dat Get directory contents which start with the letters DN, followed by five characters, with an extension of .dat.
DN[0-9][0-9][0-9][0-9][0-9].dat Get directory contents which start with the letters DN, followed by five _digits_, with an extension of .dat.
[!aeiou]* Get directory contents which do not start with a vowel letter.
[!a-d]* Get directory contents which do not start with a, b, c or d.
*\[[0-9]\].* Get directory contents whose basename ends with a single digit enclosed in square braces. If GLOB_NOESCAPE is used, a single digit enclosed in \[ and \] which would be a pretty weird name.
subdir/img*/th_?* Get directory contents whose name starts with th_ (with at least one character after that) within directories whose names start with img in the subdir directory.

Well there we go, I’ve said what I came here to say so all that remains to be done is give some link love to those two recent articles that prompted me to dust off this draft and click the “publish” button.

 

本文:PHP搜索文件夹里面的文件 Using PHP’s glob() function to find files in a directory

 

Leave a Reply