Combining ack-grep and xargs

Gotta love the command line..

I use ack-grep a lot and really, really like it. Kudos to the author and to the maintainer who takes care of the ubuntu package :-)

Sometimes though I was missing grep‘s --exclude feature that allows me to ignore certain paths while searching.

There are occasions where I e.g. want to see calls to a certain function in the code base but I am not interested in tests. Today I found an (embarrassingly) easy way to get that behaviour using xargs:

$ find . -name \*.py | grep -v tests/ | xargs ack-grep -C 3 -w 'Message\('

The snippet above first accumulates the paths of interests, then filters them and finally lets ack-grep loose on them.

Ta-da! There you go :-)

Turn on line numbers while searching in files

Introduction

Due to “popular demand” I have added a feature to the scriptutil.ffindgrep() function of the scriptutil.py module: you can now instruct it to display the line numbers for the lines found (similar to grep -n).

Please note: the examples below operate on the django project source code tree as usual.

Examples

What follows is a brief demonstration of the scriptutil.ffindgrep() function showing how one can search with and without line numbers respectively.

I am first searching with line numbers turned off. The results are displayed on lines 10-15 and 18-23 respectively.

  1 Python 2.4.4 (#1, May 22 2007, 13:30:14)
  2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> import scriptutil as SU
  5  >>> flist = SU.ffindgrep('.',
  6  ...                       shellglobs=('README*', 'AUTH*'),
  7  ...                       namefs=(lambda s: '.svn' not in s,),
  8  ...                       regexl=('Django', 'doc'))
  9  >>> flist
 10 {'./django/contrib/redirects/README.TXT':
 11  '    * The file django/docs/redirects.txt in the Django distribution',
 12  './django/contrib/flatpages/README.TXT':
 13  '    * The file docs/flatpages.txt in the Django distribution',
 14  './README':
 15  '    * First, read docs/install.txt for instructions on installing Django.'}
 16 
 17  >>> SU.printr(flist)
 18     * First, read docs/install.txt for instructions on installing Django.
 19 ./README
 20     * The file docs/flatpages.txt in the Django distribution
 21 ./django/contrib/flatpages/README.TXT
 22     * The file django/docs/redirects.txt in the Django distribution
 23 ./django/contrib/redirects/README.TXT

Now I am passing an additional parameter (namely linenums) to the scriptutil.ffindgrep() function (line 28).

The added parameter (unsurprisingly) turns on the line numbers as can be seen on lines 30-35 and 37-42 respectively.

 24  >>> flist = SU.ffindgrep('.',
 25  ...                       shellglobs=('README*', 'AUTH*'),
 26  ...                       namefs=(lambda s: '.svn' not in s,),
 27  ...                       regexl=('Django', 'doc'),
 28  ...                       linenums=True)
 29  >>> flist
 30 {'./django/contrib/redirects/README.TXT':
 31  '5:    * The file django/docs/redirects.txt in the Django distribution',
 32  './django/contrib/flatpages/README.TXT':
 33  '5:    * The file docs/flatpages.txt in the Django distribution',
 34  './README':
 35  '8:    * First, read docs/install.txt for instructions on installing Django.'}
 36  >>> SU.printr(flist)
 37 8:    * First, read docs/install.txt for instructions on installing Django.
 38 ./README
 39 5:    * The file docs/flatpages.txt in the Django distribution
 40 ./django/contrib/flatpages/README.TXT
 41 5:    * The file django/docs/redirects.txt in the Django distribution
 42 ./django/contrib/redirects/README.TXT

For more detail on the scriptutil.ffindgrep() function please see also an earlier article.

Python: find files using Unix shell-style wildcards

Introduction

In the article that follows I will show how the scriptutil.py module (syntax highlighted code here) can be used

  • to find files using Unix shell-style wildcards
  • to search inside the found files and to perform in-place search & substitute operations on them

The examples below all operate on the django project source code tree.

Finding files

In the following example I am using the scriptutil.ffind() function to find files that start either with 'README*' or 'AUTH*' (line 6 below). On the subsequent line a helper function is invoked to pretty-print the search results which are then displayed on lines 8-17.

  1 Python 2.4.4 (#1, May  9 2007, 11:05:23)
  2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> import scriptutil as SU
  5  >>> import re
  6  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'))
  7  >>> SU.printr(flist)
  8 ./.svn/text-base/AUTHORS.svn-base
  9 ./.svn/text-base/README.svn-base
 10 ./AUTHORS
 11 ./README
 12 ./django/contrib/flatpages/.svn/text-base/README.TXT.svn-base
 13 ./django/contrib/flatpages/README.TXT
 14 ./django/contrib/redirects/.svn/text-base/README.TXT.svn-base
 15 ./django/contrib/redirects/README.TXT
 16 ./extras/.svn/text-base/README.TXT.svn-base
 17 ./extras/README.TXT

In most cases I will not be interested in any files that are internal to the subversion revision control system (lines 8, 9, 12, 14 and 16). Hence the filter function on line 19 (below) that rids me of these.

 18  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'),
 19  ...                   namefs=(lambda s: '.svn' not in s,))
 20  >>> flist
 21 ['./README', './AUTHORS', './django/contrib/flatpages/README.TXT',
 22  './django/contrib/redirects/README.TXT', './extras/README.TXT']
 23  >>> SU.printr(flist)
 24 ./AUTHORS
 25 ./README
 26 ./django/contrib/flatpages/README.TXT
 27 ./django/contrib/redirects/README.TXT
 28 ./extras/README.TXT

As we can see on lines 24-28 the subversion-internal files are not part of the result set any more.

The example above also points out how shell-style wildcards operate on file names whereas the filter functions passed through the 'namefs' parameter match on the file path.

Please note: this article provides slightly more detail and additional scriptutil.ffind() examples you may want to explore.

Finding files and searching inside them

The brief scriptutil.ffindgrep() example below shows how one can search inside the files found.

 29  >>> flist = SU.ffindgrep('.', shellglobs=('README*', 'AUTH*'),
 30  ...                      namefs=(lambda s: '.svn' not in s,),
 31  ...                      regexl=(('Django', re.I), 'dist'))
 32  >>> flist
 33 {'./django/contrib/redirects/README.TXT':
 34  '    * The file django/docs/redirects.txt in the Django distribution',
 35  './django/contrib/flatpages/README.TXT':
 36  '    * The file docs/flatpages.txt in the Django distribution'}
 37  >>> SU.printr(flist)
 38     * The file docs/flatpages.txt in the Django distribution
 39 ./django/contrib/flatpages/README.TXT
 40     * The file django/docs/redirects.txt in the Django distribution
 41 ./django/contrib/redirects/README.TXT

The 'regexl' parameter (see line 31 above) contains two search items:

  1. the string ‘Django’, to be searched in case insensitive fashion
  2. the string ‘dist’, to be searched as is (i.e. in lower case)

The results returned by the function are displayed on lines 33-36 and pretty-printed on lines 38-41 respectively. For more detail on the scriptutil.ffindgrep() function please see one of my previous articles.

In-place search/substitute on files

Last but not least here’s an example of how the scriptutil.freplace() function can be utilised to search for strings in files and substitute them.

The 'regexl' parameter (passed on line 44 below) is a sequence of 3-tuples, each having the following elements:

  • search string (Python regex syntax)
  • replace string (Python regex syntax)
  • regex compilation flags or ‘None’ (re.compile syntax)

The 'bext' parameter specified on the subsequent line is the file name suffix to be used for backup copies of the modified files.

 42  >>> flist = SU.freplace('.', shellglobs=('README*',),
 43  ...                     namefs=(lambda s: '.svn' not in s,),
 44  ...                     regexl=(('distribution', '**package**', None),),
 45  ...                     bext='.bakk')

The function call above will search all occurence of the string 'distribution' and replace them with the string '**package**'. Please note that only files that passed the name filters (specified on lines 43-44) will be considered.

By searching for the backup files (line 46) we can see that the function call above resulted in two modified files.

 46  >>> flist = SU.ffind('.', shellglobs=('*.bakk',))
 47  >>> SU.printr(flist)
 48 ./django/contrib/flatpages/README.TXT.bakk
 49 ./django/contrib/redirects/README.TXT.bakk

Finally, I am searching for the replacement string '**package**' (line 50) to check that the substitution worked.

 50  >>> flist = SU.ffindgrep('.', regexl=('\\*\\*package\\*\\*',))
 51  >>> SU.printr(flist)
 52     * The file docs/flatpages.txt in the Django **package**
 53 ./django/contrib/flatpages/README.TXT
 54     * The file django/docs/redirects.txt in the Django **package**
 55 ./django/contrib/redirects/README.TXT

In conclusion

Again, I hope you liked this (brief) overview of the scriptutil.py module. Just in case that more detailed documentation is required I would like to mention that the functions presented above are documented quite extensively through documentation strings. Please check these out in the syntax highlighted source.

Python: find files and search inside them (find & grep)

Introduction

This is the second article in the file find, grep and in-place search/substitute series. It presents the scriptutil.ffindgrep() function that not only helps you find files but also allows you to search inside them.

Test data

In order to play with the scriptutil.py module (syntax highlighted code here) and I will use the same test data as in my previous article i.e. the following example directory tree:

bbox33:scriptutil $ find .
.
./a
./a/a.txt
./a/b
./a/b/b.txt
./a/b/c
./a/b/c/c.txt
./all.doc
./d
./d/d.txt
./d/e
./d/e/e.txt
./o
./o/o.txt
./o/p
./o/p/p.txt
./o/p/q
./o/p/q/q.txt
./o/p/q/r
./o/p/q/r/r.txt
./o/p/q/r/s
./o/p/q/r/s/s.txt

The text files in the tree above were populated with (random) content using the fortune program. The complete test data set may be viewed here.

Find & grep

Now let’s explore the scriptutil.ffindgrep() function and look at examples of how it can be put to good use.

  1 bbox33:scriptutil $ python
  2 Python 2.4.4 (#1, May  9 2007, 11:05:23)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import scriptutil as SU
  6  >>> import re
  7  >>> flist = SU.ffindgrep('.',
  8  ...                      namefs=(lambda s: s.endswith('.txt'),),
  9  ...                      regexl=('there',))
 10  >>> flist
 11 {'./a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 12  './o/p/q/q.txt': 'there."'}
 13  >>> SU.printr(flist)
 14 It's the only even prime, therefore it's odd.  QED.
 15 ./a/a.txt
 16 there."
 17 ./o/p/q/q.txt

On lines 7-9 (above) the scriptutil.ffindgrep() function is invoked with the following parameters:

  1. line 7: the path of the directory tree to be searched ('.')
  2. line 8: a tuple with functions (namefs) to use for filtering the files we want (in this instance just a single function that makes sure we’re looking at text files only)
  3. line 9: a tuple with regular expressions (regexl) to filter the contents of the files that passed the name tests (in this example just a single regex picking lines containing the string ‘there’)

The function’s return value is stored in the flist dictionary whose content is shown on lines 11-12. On the next line the scriptutil.printr() helper function is invoked to pretty-print the find results (lines 14-17).
As you can see two text files were found to contain lines with the string of interest.

Please note also how the scriptutil.ffindgrep() function returned a dictionary where each

  • key is the file name
  • value is a string with all the lines found

What if we wanted to look for lines that contain the string ‘there’ as above but do it in a case insensitive way?

This is precisely what I am doing in the example below.

 18  >>> flist = SU.ffindgrep('.',
 19  ...                      namefs=(lambda s: s.endswith('.txt'),),
 20  ...                      regexl=(('there', re.I),))
 21  >>> flist
 22 {'./o/p/q/q.txt': 'there."',
 23  './a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 24  './d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 25  >>> SU.printr(flist)
 26         It's the only even prime, therefore it's odd.  QED.
 27 ./a/a.txt
 28         There are never enough hours in a day, but always too many days
 29 ./d/e/e.txt
 30 there."
 31 ./o/p/q/q.txt

Please note how the regexl parameter on line 20 now contains a 2-tuple with a regex definition string ('there') and a regex compilation flag (re.I) respectively.

Due to the fact that we are ignoring the letter case an additional match is found and displayed on lines 28-29 above.

What follows below is a slightly more advanced example since it uses more than one match pattern (see line 34).
Because the lines we are interested in must satisfy an additional regular expression ('eve') the match shown on lines 30-31 above is gone.

 32  >>> flist = SU.ffindgrep('.',
 33  ...                      namefs=(lambda s: s.endswith('.txt'),),
 34  ...                      regexl=(('there', re.I), 'eve'))
 35  >>> flist
 36 {'./a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 37  './d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 38  >>> SU.printr(flist)
 39         It's the only even prime, therefore it's odd.  QED.
 40 ./a/a.txt
 41         There are never enough hours in a day, but always too many days
 42 ./d/e/e.txt

Please note:

  • The regexl parameter (e.g. on line 34 above) may contain both a simple string (with a regex definition) or a tuple (with parameters accepted by re.compile()).
  • The following regex compilation flags will not have any effect (since the scriptutil.ffindgrep() function matches on a line by line basis): re.S, re.M

Last but not least, I would like to show an example with multiple file name and file content filters: the second function in the namefs tuple (see line 44) now rules out any files with the letter ‘a’ in their path.

 43  >>> flist = SU.ffindgrep('.',
 44  ...             namefs=(lambda s: s.endswith('.txt'), lambda s: 'a' not in s),
 45  ...             regexl=(('there', re.I), 'eve'))
 46  >>> flist
 47 {'./d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 48  >>> SU.printr(flist)
 49         There are never enough hours in a day, but always too many days
 50 ./d/e/e.txt

Since files with the letter ‘a’ in their path are not acceptable any more the match shown for the text file ./a/a.txt on lines 39-40 above has disappeared.

I hope you liked this introduction to the scriptutil.ffindgrep() function and will find the latter to be a worthy addition to your Python toolchest.

Outlook

In the next article I will present the scriptutil.freplace() function that not only helps you find files but also allows you to search and replace strings inside these.

Python: file find, grep and in-line replace tools (part 1)

Introduction

This is the first article in a series that describes file find, grep and in-place search/substitute tools for Python.

What’s this all about?

As described in my previous article I find myself often in a situation where I need to

  • find files (whose paths/names are to be filtered)
  • find files and grep through their contents
  • find files and modify their content in some way

All of this is reasonably straightforward when using UNIX shell commands like find or Perl.

However, I want this kind of functionality available while programming in Python, my favourite programming language.

In this article I am covering the first use case (searching for files in a directory tree).

Test data

In order to play with the scriptutil.py module (syntax highlighted code here) and to demonstrate some of its capabilities I have set up the following example directory tree:

bbox33:scriptutil $ find .
.
./a
./a/a.txt
./a/b
./a/b/b.txt
./a/b/c
./a/b/c/c.txt
./all.doc
./d
./d/d.txt
./d/e
./d/e/e.txt
./o
./o/o.txt
./o/p
./o/p/p.txt
./o/p/q
./o/p/q/q.txt
./o/p/q/r
./o/p/q/r/r.txt
./o/p/q/r/s
./o/p/q/r/s/s.txt

The text files were populated with (random) content using the fortune program. The complete test data set may be viewed here.

Finding files

As a “warm-up exercise” we’ll start looking at a few usage examples of the scriptutil.ffind() function.

  1 bbox33:scriptutil $ python2.5
  2 Python 2.5.1 (r251:54863, May 14 2007, 09:23:46)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import scriptutil as SU
  6  >>> import re
  7  >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search,))
  8  >>> files
  9 ['./a/a.txt', './a/b/b.txt', './a/b/c/c.txt', './d/d.txt']
 10  >>> SU.printr(files)
 11 ./a/a.txt
 12 ./a/b/b.txt
 13 ./a/b/c/c.txt
 14 ./d/d.txt

On line 7 (above) I am invoking the scriptutil.ffind() function and passing the fllowing paremeters to it:

  1. the path to the directory tree to be searched ('.')
  2. a tuple with functions (namefs) to use for filtering the files we want; in this instance I am passing just one function which merely encapsulates a regular expression.

The function’s return value is stored in the files list whose content is shown on line 9. On the next line the scriptutil.printr() helper function is invoked to pretty-print the find results (lines 11-14).

In the example below I am adding one more filter function to the namefs tuple (line 16). That second function effectively weeds out any file path that contains the letter ‘b’.

 15  >>> files = SU.ffind('.', namefs=(re.compile('[a-d]\\.txt$').search,
 16  ...                                  lambda s: s.find('b') == -1))
 17  >>> files
 18 ['./a/a.txt', './d/d.txt']
 19  >>> SU.printr(files)
 20 ./a/a.txt
 21 ./d/d.txt

Hint: when working with a source code tree I would often use the following file name filter function to ignore any files internal to the subversion versioning system:

    lambda s: s.find('.svn') == -1

Outlook

In the next article I will present the scriptutil.ffindgrep() function that not only helps you find files but also allows you to search inside them.

Python for “quick jobs”

Introduction

Every so often I need to hack together a quick “shell script”-like tool. My first impulse is to do it in Perl since this is what Perl is good for.

Since my Perl skills are quite rusty I have to look up a lot of stuff and what was supposed to be a “quick job” ends up taking a lot of time. Usually much more time than is needed to write the same thing in Python, my favourite programming language.

The issue at hand

In order to lower the threshold for programming “shell script”-like stuff in Python I hence started putting together a module with functions I need for these “quick jobs”.

Something I end up needing more often than not is e.g. a command like this:

find doc -type f -name os_\\* -exec grep settings {} \\; -print

It will

  • search the directory tree starting with the ‘doc’ directory and find all files that contain the string ‘settings’
  • print the found lines along with the file names

Here’s an example (I was experimenting with the help files of the vim (vi improved) editor here):

bbox33:vim62 $ pwd
/usr/share/vim/vim62
bbox33:vim62 $ ls doc/os_*
doc/os_390.txt    doc/os_mac.txt    doc/os_qnx.txt    doc/os_win32.txt
doc/os_amiga.txt  doc/os_mint.txt   doc/os_risc.txt
doc/os_beos.txt   doc/os_msdos.txt  doc/os_unix.txt
doc/os_dos.txt    doc/os_os2.txt    doc/os_vms.txt

And now the output of the command above:

bbox33:vim62 $ find ./doc -type f -name os_\\* -exec grep settings {} \\; -print
simply reverse the notion of foreground and background color settings. To do
settings do not seem to work properly. This has been the case since DR7 at
you use the default Mouse preference settings these names indeed correspond to
./doc/os_beos.txt
without any settings.
./doc/os_os2.txt
Following the name, you can include optional settings to control the size and
./doc/os_qnx.txt
(SYS$LOGIN) to overwrite default settings.
number, try these settings. >
settings from |diff-diffexpr| and change the call to the external diff
./doc/os_vms.txt

The output above shows that four files were found to contain the string of interest, you get the idea..

The solution

scriptutil.py is a module that provides similar functionality for Python programmers and more (syntax highlighted source code here, plain code here).

The following code uses the scriptutil.py module to perform the same example search as above.

 1 import scriptutil, re
 2 
 3 flist = scriptutil.findFiles(
 4     './doc',
 5     namefs=(lambda s:s.find('os_') != -1,),
 6     contentfs=(re.compile('^.*settings.*$', re.M).findall,))
 7 
 8 scriptutil.printResults(flist)

The code calls the function scriptutil.findFiles and passes the following parameters to it:

  • './doc': the path to the top level directory of the tree to be searched
  • namefs: a sequence of functions to be used for file name filtering
  • contentfs: a sequence of functions to be applied to the content of the files that passed the file name tests

Subsequently the scriptutil.printResults helper function is invoked to pretty-print the results which are as follows:

bbox33:vim62 $ python /tmp/su.py
simply reverse the notion of foreground and background color settings. To do
settings do not seem to work properly. This has been the case since DR7 at
you use the default Mouse preference settings these names indeed correspond to
./doc/os_beos.txt
without any settings.
./doc/os_os2.txt
Following the name, you can include optional settings to control the size and
./doc/os_qnx.txt
(SYS$LOGIN) to overwrite default settings.
number, try these settings. >
settings from |diff-diffexpr| and change the call to the external diff
./doc/os_vms.txt

In conclusion

There is no reason why Python should not be used to program “shell script”-like tools. In forthcoming weblog entries I will be presenting some more sophisticated examples of what the scriptutil.findFiles function is capable of. Stay tuned!