Introduction
In the article that follows I will show how the scriptutil.py module (syntax highlighted code here) can be used
- to find files using Unix shell-style wildcards
- to search inside the found files and to perform in-place search & substitute operations on them
The examples below all operate on the django project source code tree.
Finding files
In the following example I am using the scriptutil.ffind() function to find files that start either with 'README*' or 'AUTH*' (line 6 below). On the subsequent line a helper function is invoked to pretty-print the search results which are then displayed on lines 8-17.
1 Python 2.4.4 (#1, May 9 2007, 11:05:23) 2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin 3 Type "help", "copyright", "credits" or "license" for more information. 4 >>> import scriptutil as SU 5 >>> import re 6 >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*')) 7 >>> SU.printr(flist) 8 ./.svn/text-base/AUTHORS.svn-base 9 ./.svn/text-base/README.svn-base 10 ./AUTHORS 11 ./README 12 ./django/contrib/flatpages/.svn/text-base/README.TXT.svn-base 13 ./django/contrib/flatpages/README.TXT 14 ./django/contrib/redirects/.svn/text-base/README.TXT.svn-base 15 ./django/contrib/redirects/README.TXT 16 ./extras/.svn/text-base/README.TXT.svn-base 17 ./extras/README.TXT
In most cases I will not be interested in any files that are internal to the subversion revision control system (lines 8, 9, 12, 14 and 16). Hence the filter function on line 19 (below) that rids me of these.
18 >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'), 19 ... namefs=(lambda s: '.svn' not in s,)) 20 >>> flist 21 ['./README', './AUTHORS', './django/contrib/flatpages/README.TXT', 22 './django/contrib/redirects/README.TXT', './extras/README.TXT'] 23 >>> SU.printr(flist) 24 ./AUTHORS 25 ./README 26 ./django/contrib/flatpages/README.TXT 27 ./django/contrib/redirects/README.TXT 28 ./extras/README.TXT
As we can see on lines 24-28 the subversion-internal files are not part of the result set any more.
The example above also points out how shell-style wildcards operate on file names whereas the filter functions passed through the 'namefs' parameter match on the file path.
Please note: this article provides slightly more detail and additional scriptutil.ffind() examples you may want to explore.
Finding files and searching inside them
The brief scriptutil.ffindgrep() example below shows how one can search inside the files found.
29 >>> flist = SU.ffindgrep('.', shellglobs=('README*', 'AUTH*'), 30 ... namefs=(lambda s: '.svn' not in s,), 31 ... regexl=(('Django', re.I), 'dist')) 32 >>> flist 33 {'./django/contrib/redirects/README.TXT': 34 ' * The file django/docs/redirects.txt in the Django distribution', 35 './django/contrib/flatpages/README.TXT': 36 ' * The file docs/flatpages.txt in the Django distribution'} 37 >>> SU.printr(flist) 38 * The file docs/flatpages.txt in the Django distribution 39 ./django/contrib/flatpages/README.TXT 40 * The file django/docs/redirects.txt in the Django distribution 41 ./django/contrib/redirects/README.TXT
The 'regexl' parameter (see line 31 above) contains two search items:
- the string ‘Django’, to be searched in case insensitive fashion
- the string ‘dist’, to be searched as is (i.e. in lower case)
The results returned by the function are displayed on lines 33-36 and pretty-printed on lines 38-41 respectively. For more detail on the scriptutil.ffindgrep() function please see one of my previous articles.
In-place search/substitute on files
Last but not least here’s an example of how the scriptutil.freplace() function can be utilised to search for strings in files and substitute them.
The 'regexl' parameter (passed on line 44 below) is a sequence of 3-tuples, each having the following elements:
- search string (Python regex syntax)
- replace string (Python regex syntax)
- regex compilation flags or ‘None’ (re.compile syntax)
The 'bext' parameter specified on the subsequent line is the file name suffix to be used for backup copies of the modified files.
42 >>> flist = SU.freplace('.', shellglobs=('README*',), 43 ... namefs=(lambda s: '.svn' not in s,), 44 ... regexl=(('distribution', '**package**', None),), 45 ... bext='.bakk')
The function call above will search all occurence of the string 'distribution' and replace them with the string '**package**'. Please note that only files that passed the name filters (specified on lines 43-44) will be considered.
By searching for the backup files (line 46) we can see that the function call above resulted in two modified files.
46 >>> flist = SU.ffind('.', shellglobs=('*.bakk',)) 47 >>> SU.printr(flist) 48 ./django/contrib/flatpages/README.TXT.bakk 49 ./django/contrib/redirects/README.TXT.bakk
Finally, I am searching for the replacement string '**package**' (line 50) to check that the substitution worked.
50 >>> flist = SU.ffindgrep('.', regexl=('\\*\\*package\\*\\*',)) 51 >>> SU.printr(flist) 52 * The file docs/flatpages.txt in the Django **package** 53 ./django/contrib/flatpages/README.TXT 54 * The file django/docs/redirects.txt in the Django **package** 55 ./django/contrib/redirects/README.TXT
In conclusion
Again, I hope you liked this (brief) overview of the scriptutil.py module. Just in case that more detailed documentation is required I would like to mention that the functions presented above are documented quite extensively through documentation strings. Please check these out in the syntax highlighted source.
June 4, 2007 at 5:01 am |
[...] By the way, if you are writing Python code that needs to find and manipulate files you may also want to check out the scriptutil.py module described in this article. [...]
October 9, 2007 at 3:27 pm |
May i use your scriptutil functions for enterprise/comercial use? I could not find any license information in the code.
October 9, 2007 at 3:46 pm |
If you set just one param in line 6 all files from path are returned. Try:
flist = SU.ffind(‘.’, shellglobs=(‘README*’))
Workaround for one param:
flist = SU.ffind(‘.’, shellglobs=(‘README*’, ”))
October 9, 2007 at 7:25 pm |
@murygin: please feel free to use the scriptutil.py module for fun and/or profit.
I have addded a BSD-style license to the following source: http://hrnjad.net/src/7/scriptutil.py
I am looking into the other issue that you brought up. Stay tuned
October 9, 2007 at 7:35 pm |
@murygin: I believe the problem with your code above is that you are passing
shellglobs=(’README*’)
to SU.ffind(). That’s not a tuple however
>>> shellglobs=(‘README*’)
>>> type(shellglobs)
<type ‘str’>
But for single element tuples in Python you need to have a trailing comma:
>>> shellglobs=(‘README*’,)
>>> type(shellglobs)
<type ‘tuple’>
See also chapter 5 of the Python tutorial: http://docs.python.org/tut/node7.html#SECTION007300000000000000000
October 10, 2007 at 7:04 am |
I just wrote my first python script… I like it. Thanks for the “lesson” and thanks for for the scriptutils.
November 13, 2007 at 6:28 pm |
Thanks. This is a very useful utility. I will add it to my toolbox.
September 11, 2008 at 4:00 pm |
[...] Python: find files using Unix shell-style wildcards « Muharem Hrnjadovic [...]
October 23, 2008 at 11:01 pm |
Thanks, I found it very useful
About the “bug” http://muharem.wordpress.com/2007/05/20/python-find-files-using-unix-shell-style-wildcards/#comment-3224
Passing a list like
flist = SU.ffind(’.’, shellglobs=[’README*’])
works fine
March 5, 2009 at 4:30 am |
may i know the syntax for finding varibles in a source file in python
May 12, 2009 at 7:19 am |
Hello Muharem,
I use Python 3.0 on an XP pro machine and copied your “scriptutil” script into my ‘Python3.0\Lib’ area.
When I try to use the script I get the following error:
>>> import scriptutil as SU
Traceback (most recent call last):
File “”, line 1, in
import scriptutil as SU
File “D:\Python30\lib\scriptutil.py”, line 104
except Exception, e: raise ScriptError(str(e))
^
SyntaxError: invalid syntax
Any helpful suggestions from your end on what else needs to be done?
Thanks,
Suresh
November 27, 2009 at 5:41 am |
멤피스의 생각…
Python: find files using Unix shell-style wildcards…
March 6, 2010 at 7:34 pm |
Thanks for this blog. I am new at development and this is a big help.
April 4, 2010 at 12:51 am |
Who has been the biggest influence on your life? What lessons did they teach you?