Introduction
In the article that follows I will show how the scriptutil.py module (syntax highlighted code here) can be used
- to find files using Unix shell-style wildcards
- to search inside the found files and to perform in-place search & substitute operations on them
The examples below all operate on the django project source code tree.
Finding files
In the following example I am using the scriptutil.ffind()
function to find files that start either with 'README*'
or 'AUTH*'
(line 6 below). On the subsequent line a helper function is invoked to pretty-print the search results which are then displayed on lines 8-17.
1 Python 2.4.4 (#1, May 9 2007, 11:05:23)
2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
3 Type "help", "copyright", "credits" or "license" for more information.
4 >>> import scriptutil as SU
5 >>> import re
6 >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'))
7 >>> SU.printr(flist)
8 ./.svn/text-base/AUTHORS.svn-base
9 ./.svn/text-base/README.svn-base
10 ./AUTHORS
11 ./README
12 ./django/contrib/flatpages/.svn/text-base/README.TXT.svn-base
13 ./django/contrib/flatpages/README.TXT
14 ./django/contrib/redirects/.svn/text-base/README.TXT.svn-base
15 ./django/contrib/redirects/README.TXT
16 ./extras/.svn/text-base/README.TXT.svn-base
17 ./extras/README.TXT
In most cases I will not be interested in any files that are internal to the subversion revision control system (lines 8, 9, 12, 14 and 16). Hence the filter function on line 19 (below) that rids me of these.
18 >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'),
19 ... namefs=(lambda s: '.svn' not in s,))
20 >>> flist
21 ['./README', './AUTHORS', './django/contrib/flatpages/README.TXT',
22 './django/contrib/redirects/README.TXT', './extras/README.TXT']
23 >>> SU.printr(flist)
24 ./AUTHORS
25 ./README
26 ./django/contrib/flatpages/README.TXT
27 ./django/contrib/redirects/README.TXT
28 ./extras/README.TXT
As we can see on lines 24-28 the subversion-internal files are not part of the result set any more.
The example above also points out how shell-style wildcards operate on file names whereas the filter functions passed through the 'namefs'
parameter match on the file path.
Please note: this article provides slightly more detail and additional scriptutil.ffind()
examples you may want to explore.
Finding files and searching inside them
The brief scriptutil.ffindgrep()
example below shows how one can search inside the files found.
29 >>> flist = SU.ffindgrep('.', shellglobs=('README*', 'AUTH*'),
30 ... namefs=(lambda s: '.svn' not in s,),
31 ... regexl=(('Django', re.I), 'dist'))
32 >>> flist
33 {'./django/contrib/redirects/README.TXT':
34 ' * The file django/docs/redirects.txt in the Django distribution',
35 './django/contrib/flatpages/README.TXT':
36 ' * The file docs/flatpages.txt in the Django distribution'}
37 >>> SU.printr(flist)
38 * The file docs/flatpages.txt in the Django distribution
39 ./django/contrib/flatpages/README.TXT
40 * The file django/docs/redirects.txt in the Django distribution
41 ./django/contrib/redirects/README.TXT
The 'regexl'
parameter (see line 31 above) contains two search items:
- the string ‘Django’, to be searched in case insensitive fashion
- the string ‘dist’, to be searched as is (i.e. in lower case)
The results returned by the function are displayed on lines 33-36 and pretty-printed on lines 38-41 respectively. For more detail on the scriptutil.ffindgrep()
function please see one of my previous articles.
In-place search/substitute on files
Last but not least here’s an example of how the scriptutil.freplace()
function can be utilised to search for strings in files and substitute them.
The 'regexl'
parameter (passed on line 44 below) is a sequence of 3-tuples, each having the following elements:
- search string (Python regex syntax)
- replace string (Python regex syntax)
- regex compilation flags or ‘None’ (re.compile syntax)
The 'bext'
parameter specified on the subsequent line is the file name suffix to be used for backup copies of the modified files.
42 >>> flist = SU.freplace('.', shellglobs=('README*',),
43 ... namefs=(lambda s: '.svn' not in s,),
44 ... regexl=(('distribution', '**package**', None),),
45 ... bext='.bakk')
The function call above will search all occurence of the string 'distribution'
and replace them with the string '**package**'
. Please note that only files that passed the name filters (specified on lines 43-44) will be considered.
By searching for the backup files (line 46) we can see that the function call above resulted in two modified files.
46 >>> flist = SU.ffind('.', shellglobs=('*.bakk',))
47 >>> SU.printr(flist)
48 ./django/contrib/flatpages/README.TXT.bakk
49 ./django/contrib/redirects/README.TXT.bakk
Finally, I am searching for the replacement string '**package**'
(line 50) to check that the substitution worked.
50 >>> flist = SU.ffindgrep('.', regexl=('\\*\\*package\\*\\*',))
51 >>> SU.printr(flist)
52 * The file docs/flatpages.txt in the Django **package**
53 ./django/contrib/flatpages/README.TXT
54 * The file django/docs/redirects.txt in the Django **package**
55 ./django/contrib/redirects/README.TXT
In conclusion
Again, I hope you liked this (brief) overview of the scriptutil.py module. Just in case that more detailed documentation is required I would like to mention that the functions presented above are documented quite extensively through documentation strings. Please check these out in the syntax highlighted source.