Python: file find, grep and in-line replace tools (part 1)


This is the first article in a series that describes file find, grep and in-place search/substitute tools for Python.

What’s this all about?

As described in my previous article I find myself often in a situation where I need to

  • find files (whose paths/names are to be filtered)
  • find files and grep through their contents
  • find files and modify their content in some way

All of this is reasonably straightforward when using UNIX shell commands like find or Perl.

However, I want this kind of functionality available while programming in Python, my favourite programming language.

In this article I am covering the first use case (searching for files in a directory tree).

Test data

In order to play with the module (syntax highlighted code here) and to demonstrate some of its capabilities I have set up the following example directory tree:

bbox33:scriptutil $ find .

The text files were populated with (random) content using the fortune program. The complete test data set may be viewed here.

Finding files

As a “warm-up exercise” we’ll start looking at a few usage examples of the scriptutil.ffind() function.

  1 bbox33:scriptutil $ python2.5
  2 Python 2.5.1 (r251:54863, May 14 2007, 09:23:46)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import scriptutil as SU
  6  >>> import re
  7  >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search,))
  8  >>> files
  9 ['./a/a.txt', './a/b/b.txt', './a/b/c/c.txt', './d/d.txt']
 10  >>> SU.printr(files)
 11 ./a/a.txt
 12 ./a/b/b.txt
 13 ./a/b/c/c.txt
 14 ./d/d.txt

On line 7 (above) I am invoking the scriptutil.ffind() function and passing the fllowing paremeters to it:

  1. the path to the directory tree to be searched ('.')
  2. a tuple with functions (namefs) to use for filtering the files we want; in this instance I am passing just one function which merely encapsulates a regular expression.

The function’s return value is stored in the files list whose content is shown on line 9. On the next line the scriptutil.printr() helper function is invoked to pretty-print the find results (lines 11-14).

In the example below I am adding one more filter function to the namefs tuple (line 16). That second function effectively weeds out any file path that contains the letter ‘b’.

 15  >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search,
 16  ...                                  lambda s: s.find('b') == -1))
 17  >>> files
 18 ['./a/a.txt', './d/d.txt']
 19  >>> SU.printr(files)
 20 ./a/a.txt
 21 ./d/d.txt

Hint: when working with a source code tree I would often use the following file name filter function to ignore any files internal to the subversion versioning system:

    lambda s: s.find('.svn') == -1


In the next article I will present the scriptutil.ffindgrep() function that not only helps you find files but also allows you to search inside them.


3 thoughts on “Python: file find, grep and in-line replace tools (part 1)

  1. Hm. I like to loop through ( line.rstrip(‘\n’) for line in os.popen(‘find .’) ). It’s less Pythonic, but it does have the benefit of being multi-process (the find process, and your python process).

    also, (s.find(‘.svn’) == -1) can be more straightforwardly expressed as (‘.svn’ not in s)

  2. Hello Nick,

    thanks very much for your comments! Your approach of running the actual ‘find’ utility via os.popen() is particularly interesting when one needs to filter using criteria other then the file name/path (i.e. by harnessing the full power of ‘find’).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s