Python: find files using Unix shell-style wildcards

Introduction

In the article that follows I will show how the scriptutil.py module (syntax highlighted code here) can be used

  • to find files using Unix shell-style wildcards
  • to search inside the found files and to perform in-place search & substitute operations on them

The examples below all operate on the django project source code tree.

Finding files

In the following example I am using the scriptutil.ffind() function to find files that start either with 'README*' or 'AUTH*' (line 6 below). On the subsequent line a helper function is invoked to pretty-print the search results which are then displayed on lines 8-17.

  1 Python 2.4.4 (#1, May  9 2007, 11:05:23)
  2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> import scriptutil as SU
  5  >>> import re
  6  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'))
  7  >>> SU.printr(flist)
  8 ./.svn/text-base/AUTHORS.svn-base
  9 ./.svn/text-base/README.svn-base
 10 ./AUTHORS
 11 ./README
 12 ./django/contrib/flatpages/.svn/text-base/README.TXT.svn-base
 13 ./django/contrib/flatpages/README.TXT
 14 ./django/contrib/redirects/.svn/text-base/README.TXT.svn-base
 15 ./django/contrib/redirects/README.TXT
 16 ./extras/.svn/text-base/README.TXT.svn-base
 17 ./extras/README.TXT

In most cases I will not be interested in any files that are internal to the subversion revision control system (lines 8, 9, 12, 14 and 16). Hence the filter function on line 19 (below) that rids me of these.

 18  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'),
 19  ...                   namefs=(lambda s: '.svn' not in s,))
 20  >>> flist
 21 ['./README', './AUTHORS', './django/contrib/flatpages/README.TXT',
 22  './django/contrib/redirects/README.TXT', './extras/README.TXT']
 23  >>> SU.printr(flist)
 24 ./AUTHORS
 25 ./README
 26 ./django/contrib/flatpages/README.TXT
 27 ./django/contrib/redirects/README.TXT
 28 ./extras/README.TXT

As we can see on lines 24-28 the subversion-internal files are not part of the result set any more.

The example above also points out how shell-style wildcards operate on file names whereas the filter functions passed through the 'namefs' parameter match on the file path.

Please note: this article provides slightly more detail and additional scriptutil.ffind() examples you may want to explore.

Finding files and searching inside them

The brief scriptutil.ffindgrep() example below shows how one can search inside the files found.

 29  >>> flist = SU.ffindgrep('.', shellglobs=('README*', 'AUTH*'),
 30  ...                      namefs=(lambda s: '.svn' not in s,),
 31  ...                      regexl=(('Django', re.I), 'dist'))
 32  >>> flist
 33 {'./django/contrib/redirects/README.TXT':
 34  '    * The file django/docs/redirects.txt in the Django distribution',
 35  './django/contrib/flatpages/README.TXT':
 36  '    * The file docs/flatpages.txt in the Django distribution'}
 37  >>> SU.printr(flist)
 38     * The file docs/flatpages.txt in the Django distribution
 39 ./django/contrib/flatpages/README.TXT
 40     * The file django/docs/redirects.txt in the Django distribution
 41 ./django/contrib/redirects/README.TXT

The 'regexl' parameter (see line 31 above) contains two search items:

  1. the string ‘Django’, to be searched in case insensitive fashion
  2. the string ‘dist’, to be searched as is (i.e. in lower case)

The results returned by the function are displayed on lines 33-36 and pretty-printed on lines 38-41 respectively. For more detail on the scriptutil.ffindgrep() function please see one of my previous articles.

In-place search/substitute on files

Last but not least here’s an example of how the scriptutil.freplace() function can be utilised to search for strings in files and substitute them.

The 'regexl' parameter (passed on line 44 below) is a sequence of 3-tuples, each having the following elements:

  • search string (Python regex syntax)
  • replace string (Python regex syntax)
  • regex compilation flags or ‘None’ (re.compile syntax)

The 'bext' parameter specified on the subsequent line is the file name suffix to be used for backup copies of the modified files.

 42  >>> flist = SU.freplace('.', shellglobs=('README*',),
 43  ...                     namefs=(lambda s: '.svn' not in s,),
 44  ...                     regexl=(('distribution', '**package**', None),),
 45  ...                     bext='.bakk')

The function call above will search all occurence of the string 'distribution' and replace them with the string '**package**'. Please note that only files that passed the name filters (specified on lines 43-44) will be considered.

By searching for the backup files (line 46) we can see that the function call above resulted in two modified files.

 46  >>> flist = SU.ffind('.', shellglobs=('*.bakk',))
 47  >>> SU.printr(flist)
 48 ./django/contrib/flatpages/README.TXT.bakk
 49 ./django/contrib/redirects/README.TXT.bakk

Finally, I am searching for the replacement string '**package**' (line 50) to check that the substitution worked.

 50  >>> flist = SU.ffindgrep('.', regexl=('\\*\\*package\\*\\*',))
 51  >>> SU.printr(flist)
 52     * The file docs/flatpages.txt in the Django **package**
 53 ./django/contrib/flatpages/README.TXT
 54     * The file django/docs/redirects.txt in the Django **package**
 55 ./django/contrib/redirects/README.TXT

In conclusion

Again, I hope you liked this (brief) overview of the scriptutil.py module. Just in case that more detailed documentation is required I would like to mention that the functions presented above are documented quite extensively through documentation strings. Please check these out in the syntax highlighted source.

About these ads

14 thoughts on “Python: find files using Unix shell-style wildcards

  1. Pingback: Determine order of execution by (re-)sequencing your source code files « Muharem Hrnjadovic

  2. If you set just one param in line 6 all files from path are returned. Try:

    flist = SU.ffind(‘.’, shellglobs=(‘README*’))

    Workaround for one param:

    flist = SU.ffind(‘.’, shellglobs=(‘README*’, ”))

  3. @murygin: I believe the problem with your code above is that you are passing

    shellglobs=(’README*’)

    to SU.ffind(). That’s not a tuple however :-)

    >>> shellglobs=(‘README*’)
    >>> type(shellglobs)
    <type ‘str’>

    But for single element tuples in Python you need to have a trailing comma:

    >>> shellglobs=(‘README*’,)
    >>> type(shellglobs)
    <type ‘tuple’>

    See also chapter 5 of the Python tutorial: http://docs.python.org/tut/node7.html#SECTION007300000000000000000

  4. Pingback: Pythonic grep sort of « Ramblings

  5. Hello Muharem,
    I use Python 3.0 on an XP pro machine and copied your “scriptutil” script into my ‘Python3.0\Lib’ area.
    When I try to use the script I get the following error:

    >>> import scriptutil as SU

    Traceback (most recent call last):
    File “”, line 1, in
    import scriptutil as SU
    File “D:\Python30\lib\scriptutil.py”, line 104
    except Exception, e: raise ScriptError(str(e))
    ^
    SyntaxError: invalid syntax

    Any helpful suggestions from your end on what else needs to be done?

    Thanks,
    Suresh

  6. Pingback: cychong's me2DAY

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s