Introduction
This is the first article in a series that describes file find, grep and in-place search/substitute tools for Python.
What’s this all about?
As described in my previous article I find myself often in a situation where I need to
- find files (whose paths/names are to be filtered)
- find files and grep through their contents
- find files and modify their content in some way
All of this is reasonably straightforward when using UNIX shell commands like find or Perl.
However, I want this kind of functionality available while programming in Python, my favourite programming language.
In this article I am covering the first use case (searching for files in a directory tree).
Test data
In order to play with the scriptutil.py module (syntax highlighted code here) and to demonstrate some of its capabilities I have set up the following example directory tree:
bbox33:scriptutil $ find .
.
./a
./a/a.txt
./a/b
./a/b/b.txt
./a/b/c
./a/b/c/c.txt
./all.doc
./d
./d/d.txt
./d/e
./d/e/e.txt
./o
./o/o.txt
./o/p
./o/p/p.txt
./o/p/q
./o/p/q/q.txt
./o/p/q/r
./o/p/q/r/r.txt
./o/p/q/r/s
./o/p/q/r/s/s.txt
The text files were populated with (random) content using the fortune program. The complete test data set may be viewed here.
Finding files
As a “warm-up exercise” we’ll start looking at a few usage examples of the scriptutil.ffind() function.
1 bbox33:scriptutil $ python2.5 2 Python 2.5.1 (r251:54863, May 14 2007, 09:23:46) 3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin 4 Type "help", "copyright", "credits" or "license" for more information. 5 >>> import scriptutil as SU 6 >>> import re 7 >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search,)) 8 >>> files 9 ['./a/a.txt', './a/b/b.txt', './a/b/c/c.txt', './d/d.txt'] 10 >>> SU.printr(files) 11 ./a/a.txt 12 ./a/b/b.txt 13 ./a/b/c/c.txt 14 ./d/d.txt
On line 7 (above) I am invoking the scriptutil.ffind() function and passing the fllowing paremeters to it:
- the path to the directory tree to be searched (
'.') - a tuple with functions (
namefs) to use for filtering the files we want; in this instance I am passing just one function which merely encapsulates aregular expression.
The function’s return value is stored in the files list whose content is shown on line 9. On the next line the scriptutil.printr() helper function is invoked to pretty-print the find results (lines 11-14).
In the example below I am adding one more filter function to the namefs tuple (line 16). That second function effectively weeds out any file path that contains the letter ‘b’.
15 >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search, 16 ... lambda s: s.find('b') == -1)) 17 >>> files 18 ['./a/a.txt', './d/d.txt'] 19 >>> SU.printr(files) 20 ./a/a.txt 21 ./d/d.txt
Hint: when working with a source code tree I would often use the following file name filter function to ignore any files internal to the subversion versioning system:
lambda s: s.find('.svn') == -1
Outlook
In the next article I will present the scriptutil.ffindgrep() function that not only helps you find files but also allows you to search inside them.
May 16, 2007 at 10:25 pm |
Hm. I like to loop through ( line.rstrip(‘\n’) for line in os.popen(‘find .’) ). It’s less Pythonic, but it does have the benefit of being multi-process (the find process, and your python process).
also, (s.find(‘.svn’) == -1) can be more straightforwardly expressed as (‘.svn’ not in s)
May 16, 2007 at 10:56 pm |
Hello Nick,
thanks very much for your comments! Your approach of running the actual ‘find’ utility via os.popen() is particularly interesting when one needs to filter using criteria other then the file name/path (i.e. by harnessing the full power of ‘find’).
April 4, 2010 at 12:52 am |
Do you think about dying? Are you scared?