Roll your own server in 50 lines of code

Introduction

Just in case you wondered why there are so many frameworks in Python land, here’s a basic server (including a request dispatch mechanism) in only 50 lines of code (syntax highlighted code here).

The source code structure is as follows:

 1 bbox33:servera $ find . -type f -name \\*.py
 2 ./modules/__init__.py
 3 ./modules/admin/__init__.py
 4 ./modules/admin/disk.py
 5 ./rhbase.py
 6 ./server.py

Here’s a brief demo:

  1 bbox33:servera $ python server.py
  2 ** 2007-05-30 00:15:18,628 INFO - starting server..
  3 !! please type in a req in HTTP-GET format (or 'q' to quit)
  4  >>> http://xyz.net/admin/disk/purge?path=/tmp
  5 ** 2007-05-30 00:15:34,889 INFO - do_purge() called for params: 'path=/tmp'
  6 ** 2007-05-30 00:15:34,889 INFO - S: http://xyz.net/admin/disk/purge?path=/tmp
  7 !! please type in a req in HTTP-GET format (or 'q' to quit)
  8  >>> http://xyz.net/admin/disk/list?path=/usr
  9 ** 2007-05-30 00:15:48,791 ERROR - no 'do_list' function in '<class 'modules.admin.disk.ReqH'>'
 10 ** 2007-05-30 00:15:48,791 INFO - F: http://xyz.net/admin/disk/list?path=/usr
 11 !! please type in a req in HTTP-GET format (or 'q' to quit)
 12  >>> q
 13 ** 2007-05-30 00:15:51,695 INFO - server terminated

And here’s what actually went on:

  1. the server is invoked (line 1) and is given a request URI (line 4) to serve
  2. based on the request URI path (/admin/disk/purge) an appropriate request handler is loaded dynamically (modules.admin.disk)
  3. the request handler’s dispatch() function is called
  4. by looking at the last segment of the request URI path (purge) the dispatch function guesses that the request should actually be handled by a method called do_purge()
  5. do_purge() is invoked, logs the request parameters (line 5) and returns with a value indicating success (zero)
  6. after the request was handled the server logs the URI and the outcome (line 6 (‘S’ stands for success))
  7. on line 8 the user typed in another request URI but this time no handler function could be found (resulting in the error logged on line 10 (‘F’ stands for failure))

The server

The server is meant to operate on HTTP GET style requests. The serve() method (on line 9) accepts a req string parameter that contains a HTTP GET style URI.

The dfuncs dictionary (defined on line 7) is a request handler function cache. The key is the path portion of the request URI. The value holds the corresponding request handler function.
The scheme + authority portion of the URI are ignored by the dispatch mechanism. Any URIs with identical path portions map to the same request handler module and function.

For example the following URIs both map to the module modules.admin.disk, class ReqH, method dispatch():

The last segment of the URI’s path portion is taken to be the request action (i.e. sleep in the URI path above).

Serving of requests

  1 #!/usr/bin/env python
  2 # encoding: utf-8
  3 import logging, urlparse
  4 
  5 class Server(object):
  6     def __init__(self):
  7         self.dfuncs = dict()
  8         self.modules = dict()
  9     def serve(self, req):
 10         result = -1
 11         req_path = urlparse.urlsplit(req)[2]
 12         if req_path in self.dfuncs: dfunc = self.dfuncs[req_path] # in cache
 13         else: dfunc = self.load(req_path)  # handler function not in cache
 14         if dfunc: result = dfunc(req)
 15         return(result)

The server first checks whether the handler function for a request is already in its cache (line 12). If this is not the case (line 13), the load() method is called in order to load the appropriate request handler module and function.

The request handler function — if available — is invoked on line 14. Please note that a zero is returned in case of success and any non-zero return value indicates a failure.

Dynamic code loading

The load method below constructs the name of the request handler module (line 19) and tries to import it (line 23).

For more detail on dynamic code loading in Python

 16     def load(self, req_path):
 17         logging.debug("load() called for '%s'" % req_path)
 18         m = dfunc = None
 19         mpath = 'modules.%s' % '.'.join(filter(None, req_path.split('/')[:-1]))
 20         # already have the module needed?
 21         if mpath in self.modules: m = self.modules[mpath] # yes!
 22         else:   # no, try import
 23             try: m = __import__(mpath, globals(), locals(), ['ReqH'])
 24             except ImportError, e: logging.error(str(e))
 25             else: self.modules[m.__name__] = m  # cache it

If the request handler module was imported successfully, the loading logic checks whether the module has a ReqH class (line 28) and whether the latter has a dispatch attribute that may be called (line 30).

If all checks succeed the dispatch attribute is returned as the request handler function (line 34).

 26         if m:
 27             # got the module, does it have a 'ReqH' class?
 28             if hasattr(m, 'ReqH'):
 29                 # yes, does the 'ReqH' class have a dispatch function?
 30                 if hasattr(m.ReqH, 'dispatch') and callable(m.ReqH.dispatch):
 31                     self.dfuncs[req_path] = dfunc = m.ReqH.dispatch
 32                 else: logging.error("no dispatch function in '%s'" % mpath)
 33             else: logging.error("no request handler class in '%s'" % mpath)
 34         return dfunc

Logging setup

A proper server needs to log errors as well as any requests received. A StreamHandler (logging to stderr by default) is set up as the root logger. For more detail on Python logging see

 35 if __name__ == '__main__':
 36     logf = logging.Formatter('** %(asctime)s %(levelname)s - %(message)s')
 37     lcon = logging.StreamHandler()
 38     lcon.setFormatter(logf)
 39     logging.getLogger('').addHandler(lcon)
 40     logging.getLogger('').setLevel(logging.INFO)
 41     logging.info('starting server..')

Minimum scaffolding

The section of code below is a minimal input loop facilitating the experimentation with the server.

 42     server = Server()
 43     while 1:
 44         req = raw_input("!! please type in a req in HTTP-GET format " \\
 45                         "(or 'q' to quit)\\n>>> ")
 46         if req == 'q': break
 47         result = server.serve(req)
 48         if result == 0: logging.info('S: %s' % req)
 49         else: logging.info('F: %s' % req)
 50     logging.info('server terminated')

As noted above, a zero return value is taken to be an indication of success (line 48) whereas a non-zero value signifies failure (line 49).

Request handling

It is an established practice to split the back-end logic into a number of modules that are loaded by the server on demand (as shown above).

In our example each request handler module needs to have a dispatch() method that takes the URI string as its only argument.

DRY

In order to factor out the boiler plate code, a request handler base class that takes care of the request action dispatching is put into place.

Please note that the dispatch() method is a class method i.e. it receives a class object as its implicit first argument (lines 7-8 below). This is different from a static method that receives no implicit first argument whatsoever.

  1 #!/usr/bin/env python
  2 # encoding: utf-8
  3 
  4 import logging, urlparse
  5 
  6 class ReqHBase(object):
  7     @classmethod
  8     def dispatch(cls, req):
  9         result = -1
 10         targetf = 'do_%s' % urlparse.urlsplit(req)[2].split('/')[-1]
 11 
 12         try: handler = getattr(cls, targetf)
 13         except AttributeError:
 14             logging.error("no '%s' function in '%s'" % (targetf, str(cls)))
 15         else: # is it a function?
 16             if callable(handler):
 17                 try: result = handler(req)
 18                 except Exception, e: logging.exception(str(e))
 19             else:
 20                 logging.error("'%s' not callable in '%s'" % (targetf, str(cls)))
 21         return(result)

The dispatch code above takes the last segment of the URI path (i.e. the request action) and expects to find a static method called do_<last_segment> (line 10)

The expected target function must both be present and callable to be invoked. Otherwise the corresponding error messages are logged (line 14 and 20).

Example request handler module

What follows is a “minimalistic” request handler module. It is derived from the ReqHBase class and hence inherits the latter’s dispatch() method.

Please note: more sophisticated request handler modules may be derived from object and implement their own dispatch() methods as needed.

  1 #!/usr/bin/env python
  2 # encoding: utf-8
  3 
  4 import logging, urlparse
  5 from urlparse import urlsplit
  6 from rhbase import ReqHBase
  7 
  8 class ReqH(ReqHBase):
  9     do_nothing = 1
 10     @staticmethod
 11     def do_purge(req):
 12         logging.info("do_purge() called for params: '%s'" % urlsplit(req)[3])
 13         return(0)

In conclusion

The server presented in this article is quite simple (it handles all requests in serial fashion, no threading is used etc.). Nevertheless, it clearly demonstrates the potency and the productivity of the Python programming environment.

A little code goes a long way :-)

Laziness is a virtue (in programming)

Introduction

One of the things I really appreciate about Python is that it allows me to explore different programming styles. When I started with Python (coming from a C/C++/Java background) I would mostly use procedural style constructs.

These days, however, I am usually inclining towards a more functional pogramming style.

One of the key books that swayed me towards the latter was Mark Pilgrim’s “Dive Into Python”, one of the best (technical) books I have read so far. If you haven’t come across it yet, you should definitely take a look at it.

Another Python gem you may want to check out while experimenting with functional programming ideas is the itertools module.

Example

In the brief section of code that follows below I am contrasting the procedural versus the functional pogramming style through an example in which I am interested in obtaining the first in a (potentially large) range of numbers whose 8th root is an integer.

  1 from math import (pow, modf)
  2 from itertools import dropwhile
  3 
  4 def calculate(x):
  5     """calculates the 8th root of the number 'x'"""
  6     return(pow(x, 0.125))
  7 
  8 def wanted(v):
  9     """integer check: returns 'True' if fractional part of a number is 0"""
 10     return(not modf(v)[0])
 11 
 12 def unwanted(v):
 13     """returns 'False' if fractional part of a number is > 0"""
 14     return(modf(v)[0])

The lines 16-25 show a procedural style function using a loop and related control flow constructs to get the job done.

 16 def proceduralStyle(f, t):
 17     """
 18     finds the first in the range of numbers [f..t[ whose 8th root is an
 19     integer; procedural style
 20     """
 21     for x in xrange(f, t):
 22         v = calculate(x)
 23         if wanted(v): break
 24     else: v = None
 25     return(v)

The functional style solution (see lines 27-34 below) uses the dropwhile() method (from the itertools module) to get rid of any unwanted initial values (line 32).

Please note also how a generator expression (as opposed to normal list comprehension) is used (on line 32) to achieve lazy evaluation behaviour.

The syntactical differences are minute (parentheses versus square brackets) but the effects are huge.

A normal list comprehension will construct the entire list (and perform any calculations needed in the process) before completing. A generator expression on the other hand is comparable to an iterator, values are returned in piecemeal fashion and calculations are performed only as needed.

 27 def functionalStyle(f, t):
 28     """
 29     finds the first in the range of numbers [f..t[ whose 8th root is an
 30     integer; functional style
 31     """
 32     r8iter = dropwhile(unwanted, (calculate(x) for x in xrange(f, t)))
 33     try: return(r8iter.next())
 34     except StopIteration: return(None)

Finally, by experimenting with the source code we can see that lazy evaluation is in effect for the functionalStyle() function. It returns straightaway with the result anticipated (256 is the first number in the range whose 8th root is an integer (2)).

  1 bbox33:published $ python
  2 Python 2.4.4 (#1, May 22 2007, 13:30:14)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import evaltest2
  6  >>> from time import asctime as t
  7  >>> print t(); evaltest2.functionalStyle(2, 50000000); print t()
  8 Sun May 27 10:33:18 2007
  9 2.0
 10 Sun May 27 10:33:18 2007
 11  >>> print t(); evaltest2.proceduralStyle(2, 50000000); print t()
 12 Sun May 27 10:33:30 2007
 13 2.0
 14 Sun May 27 10:33:30 2007

Turn on line numbers while searching in files

Introduction

Due to “popular demand” I have added a feature to the scriptutil.ffindgrep() function of the scriptutil.py module: you can now instruct it to display the line numbers for the lines found (similar to grep -n).

Please note: the examples below operate on the django project source code tree as usual.

Examples

What follows is a brief demonstration of the scriptutil.ffindgrep() function showing how one can search with and without line numbers respectively.

I am first searching with line numbers turned off. The results are displayed on lines 10-15 and 18-23 respectively.

  1 Python 2.4.4 (#1, May 22 2007, 13:30:14)
  2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> import scriptutil as SU
  5  >>> flist = SU.ffindgrep('.',
  6  ...                       shellglobs=('README*', 'AUTH*'),
  7  ...                       namefs=(lambda s: '.svn' not in s,),
  8  ...                       regexl=('Django', 'doc'))
  9  >>> flist
 10 {'./django/contrib/redirects/README.TXT':
 11  '    * The file django/docs/redirects.txt in the Django distribution',
 12  './django/contrib/flatpages/README.TXT':
 13  '    * The file docs/flatpages.txt in the Django distribution',
 14  './README':
 15  '    * First, read docs/install.txt for instructions on installing Django.'}
 16 
 17  >>> SU.printr(flist)
 18     * First, read docs/install.txt for instructions on installing Django.
 19 ./README
 20     * The file docs/flatpages.txt in the Django distribution
 21 ./django/contrib/flatpages/README.TXT
 22     * The file django/docs/redirects.txt in the Django distribution
 23 ./django/contrib/redirects/README.TXT

Now I am passing an additional parameter (namely linenums) to the scriptutil.ffindgrep() function (line 28).

The added parameter (unsurprisingly) turns on the line numbers as can be seen on lines 30-35 and 37-42 respectively.

 24  >>> flist = SU.ffindgrep('.',
 25  ...                       shellglobs=('README*', 'AUTH*'),
 26  ...                       namefs=(lambda s: '.svn' not in s,),
 27  ...                       regexl=('Django', 'doc'),
 28  ...                       linenums=True)
 29  >>> flist
 30 {'./django/contrib/redirects/README.TXT':
 31  '5:    * The file django/docs/redirects.txt in the Django distribution',
 32  './django/contrib/flatpages/README.TXT':
 33  '5:    * The file docs/flatpages.txt in the Django distribution',
 34  './README':
 35  '8:    * First, read docs/install.txt for instructions on installing Django.'}
 36  >>> SU.printr(flist)
 37 8:    * First, read docs/install.txt for instructions on installing Django.
 38 ./README
 39 5:    * The file docs/flatpages.txt in the Django distribution
 40 ./django/contrib/flatpages/README.TXT
 41 5:    * The file django/docs/redirects.txt in the Django distribution
 42 ./django/contrib/redirects/README.TXT

For more detail on the scriptutil.ffindgrep() function please see also an earlier article.

Learn the Python standard library

One of the big benefits of making code available publicly is the feedback received. Sometimes, the feedback points to python standard library modules I was unaware of.

A case in point is a post on Linux Questions in which the author is pointing out the fileinput module.

Python indeed comes with the batteries included :-)

I will have a more thorough look at the fileinput module and probably refactor the scriptutil.freplace() function (described here) to make use of it.

Python: find files using Unix shell-style wildcards

Introduction

In the article that follows I will show how the scriptutil.py module (syntax highlighted code here) can be used

  • to find files using Unix shell-style wildcards
  • to search inside the found files and to perform in-place search & substitute operations on them

The examples below all operate on the django project source code tree.

Finding files

In the following example I am using the scriptutil.ffind() function to find files that start either with 'README*' or 'AUTH*' (line 6 below). On the subsequent line a helper function is invoked to pretty-print the search results which are then displayed on lines 8-17.

  1 Python 2.4.4 (#1, May  9 2007, 11:05:23)
  2 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  3 Type "help", "copyright", "credits" or "license" for more information.
  4  >>> import scriptutil as SU
  5  >>> import re
  6  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'))
  7  >>> SU.printr(flist)
  8 ./.svn/text-base/AUTHORS.svn-base
  9 ./.svn/text-base/README.svn-base
 10 ./AUTHORS
 11 ./README
 12 ./django/contrib/flatpages/.svn/text-base/README.TXT.svn-base
 13 ./django/contrib/flatpages/README.TXT
 14 ./django/contrib/redirects/.svn/text-base/README.TXT.svn-base
 15 ./django/contrib/redirects/README.TXT
 16 ./extras/.svn/text-base/README.TXT.svn-base
 17 ./extras/README.TXT

In most cases I will not be interested in any files that are internal to the subversion revision control system (lines 8, 9, 12, 14 and 16). Hence the filter function on line 19 (below) that rids me of these.

 18  >>> flist = SU.ffind('.', shellglobs=('README*', 'AUTH*'),
 19  ...                   namefs=(lambda s: '.svn' not in s,))
 20  >>> flist
 21 ['./README', './AUTHORS', './django/contrib/flatpages/README.TXT',
 22  './django/contrib/redirects/README.TXT', './extras/README.TXT']
 23  >>> SU.printr(flist)
 24 ./AUTHORS
 25 ./README
 26 ./django/contrib/flatpages/README.TXT
 27 ./django/contrib/redirects/README.TXT
 28 ./extras/README.TXT

As we can see on lines 24-28 the subversion-internal files are not part of the result set any more.

The example above also points out how shell-style wildcards operate on file names whereas the filter functions passed through the 'namefs' parameter match on the file path.

Please note: this article provides slightly more detail and additional scriptutil.ffind() examples you may want to explore.

Finding files and searching inside them

The brief scriptutil.ffindgrep() example below shows how one can search inside the files found.

 29  >>> flist = SU.ffindgrep('.', shellglobs=('README*', 'AUTH*'),
 30  ...                      namefs=(lambda s: '.svn' not in s,),
 31  ...                      regexl=(('Django', re.I), 'dist'))
 32  >>> flist
 33 {'./django/contrib/redirects/README.TXT':
 34  '    * The file django/docs/redirects.txt in the Django distribution',
 35  './django/contrib/flatpages/README.TXT':
 36  '    * The file docs/flatpages.txt in the Django distribution'}
 37  >>> SU.printr(flist)
 38     * The file docs/flatpages.txt in the Django distribution
 39 ./django/contrib/flatpages/README.TXT
 40     * The file django/docs/redirects.txt in the Django distribution
 41 ./django/contrib/redirects/README.TXT

The 'regexl' parameter (see line 31 above) contains two search items:

  1. the string ‘Django’, to be searched in case insensitive fashion
  2. the string ‘dist’, to be searched as is (i.e. in lower case)

The results returned by the function are displayed on lines 33-36 and pretty-printed on lines 38-41 respectively. For more detail on the scriptutil.ffindgrep() function please see one of my previous articles.

In-place search/substitute on files

Last but not least here’s an example of how the scriptutil.freplace() function can be utilised to search for strings in files and substitute them.

The 'regexl' parameter (passed on line 44 below) is a sequence of 3-tuples, each having the following elements:

  • search string (Python regex syntax)
  • replace string (Python regex syntax)
  • regex compilation flags or ‘None’ (re.compile syntax)

The 'bext' parameter specified on the subsequent line is the file name suffix to be used for backup copies of the modified files.

 42  >>> flist = SU.freplace('.', shellglobs=('README*',),
 43  ...                     namefs=(lambda s: '.svn' not in s,),
 44  ...                     regexl=(('distribution', '**package**', None),),
 45  ...                     bext='.bakk')

The function call above will search all occurence of the string 'distribution' and replace them with the string '**package**'. Please note that only files that passed the name filters (specified on lines 43-44) will be considered.

By searching for the backup files (line 46) we can see that the function call above resulted in two modified files.

 46  >>> flist = SU.ffind('.', shellglobs=('*.bakk',))
 47  >>> SU.printr(flist)
 48 ./django/contrib/flatpages/README.TXT.bakk
 49 ./django/contrib/redirects/README.TXT.bakk

Finally, I am searching for the replacement string '**package**' (line 50) to check that the substitution worked.

 50  >>> flist = SU.ffindgrep('.', regexl=('\\*\\*package\\*\\*',))
 51  >>> SU.printr(flist)
 52     * The file docs/flatpages.txt in the Django **package**
 53 ./django/contrib/flatpages/README.TXT
 54     * The file django/docs/redirects.txt in the Django **package**
 55 ./django/contrib/redirects/README.TXT

In conclusion

Again, I hope you liked this (brief) overview of the scriptutil.py module. Just in case that more detailed documentation is required I would like to mention that the functions presented above are documented quite extensively through documentation strings. Please check these out in the syntax highlighted source.

Python: find files and search inside them (find & grep)

Introduction

This is the second article in the file find, grep and in-place search/substitute series. It presents the scriptutil.ffindgrep() function that not only helps you find files but also allows you to search inside them.

Test data

In order to play with the scriptutil.py module (syntax highlighted code here) and I will use the same test data as in my previous article i.e. the following example directory tree:

bbox33:scriptutil $ find .
.
./a
./a/a.txt
./a/b
./a/b/b.txt
./a/b/c
./a/b/c/c.txt
./all.doc
./d
./d/d.txt
./d/e
./d/e/e.txt
./o
./o/o.txt
./o/p
./o/p/p.txt
./o/p/q
./o/p/q/q.txt
./o/p/q/r
./o/p/q/r/r.txt
./o/p/q/r/s
./o/p/q/r/s/s.txt

The text files in the tree above were populated with (random) content using the fortune program. The complete test data set may be viewed here.

Find & grep

Now let’s explore the scriptutil.ffindgrep() function and look at examples of how it can be put to good use.

  1 bbox33:scriptutil $ python
  2 Python 2.4.4 (#1, May  9 2007, 11:05:23)
  3 [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
  4 Type "help", "copyright", "credits" or "license" for more information.
  5  >>> import scriptutil as SU
  6  >>> import re
  7  >>> flist = SU.ffindgrep('.',
  8  ...                      namefs=(lambda s: s.endswith('.txt'),),
  9  ...                      regexl=('there',))
 10  >>> flist
 11 {'./a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 12  './o/p/q/q.txt': 'there."'}
 13  >>> SU.printr(flist)
 14 It's the only even prime, therefore it's odd.  QED.
 15 ./a/a.txt
 16 there."
 17 ./o/p/q/q.txt

On lines 7-9 (above) the scriptutil.ffindgrep() function is invoked with the following parameters:

  1. line 7: the path of the directory tree to be searched ('.')
  2. line 8: a tuple with functions (namefs) to use for filtering the files we want (in this instance just a single function that makes sure we’re looking at text files only)
  3. line 9: a tuple with regular expressions (regexl) to filter the contents of the files that passed the name tests (in this example just a single regex picking lines containing the string ‘there’)

The function’s return value is stored in the flist dictionary whose content is shown on lines 11-12. On the next line the scriptutil.printr() helper function is invoked to pretty-print the find results (lines 14-17).
As you can see two text files were found to contain lines with the string of interest.

Please note also how the scriptutil.ffindgrep() function returned a dictionary where each

  • key is the file name
  • value is a string with all the lines found

What if we wanted to look for lines that contain the string ‘there’ as above but do it in a case insensitive way?

This is precisely what I am doing in the example below.

 18  >>> flist = SU.ffindgrep('.',
 19  ...                      namefs=(lambda s: s.endswith('.txt'),),
 20  ...                      regexl=(('there', re.I),))
 21  >>> flist
 22 {'./o/p/q/q.txt': 'there."',
 23  './a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 24  './d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 25  >>> SU.printr(flist)
 26         It's the only even prime, therefore it's odd.  QED.
 27 ./a/a.txt
 28         There are never enough hours in a day, but always too many days
 29 ./d/e/e.txt
 30 there."
 31 ./o/p/q/q.txt

Please note how the regexl parameter on line 20 now contains a 2-tuple with a regex definition string ('there') and a regex compilation flag (re.I) respectively.

Due to the fact that we are ignoring the letter case an additional match is found and displayed on lines 28-29 above.

What follows below is a slightly more advanced example since it uses more than one match pattern (see line 34).
Because the lines we are interested in must satisfy an additional regular expression ('eve') the match shown on lines 30-31 above is gone.

 32  >>> flist = SU.ffindgrep('.',
 33  ...                      namefs=(lambda s: s.endswith('.txt'),),
 34  ...                      regexl=(('there', re.I), 'eve'))
 35  >>> flist
 36 {'./a/a.txt': "\\tIt's the only even prime, therefore it's odd.  QED.",
 37  './d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 38  >>> SU.printr(flist)
 39         It's the only even prime, therefore it's odd.  QED.
 40 ./a/a.txt
 41         There are never enough hours in a day, but always too many days
 42 ./d/e/e.txt

Please note:

  • The regexl parameter (e.g. on line 34 above) may contain both a simple string (with a regex definition) or a tuple (with parameters accepted by re.compile()).
  • The following regex compilation flags will not have any effect (since the scriptutil.ffindgrep() function matches on a line by line basis): re.S, re.M

Last but not least, I would like to show an example with multiple file name and file content filters: the second function in the namefs tuple (see line 44) now rules out any files with the letter ‘a’ in their path.

 43  >>> flist = SU.ffindgrep('.',
 44  ...             namefs=(lambda s: s.endswith('.txt'), lambda s: 'a' not in s),
 45  ...             regexl=(('there', re.I), 'eve'))
 46  >>> flist
 47 {'./d/e/e.txt': '\\tThere are never enough hours in a day, but always too many days'}
 48  >>> SU.printr(flist)
 49         There are never enough hours in a day, but always too many days
 50 ./d/e/e.txt

Since files with the letter ‘a’ in their path are not acceptable any more the match shown for the text file ./a/a.txt on lines 39-40 above has disappeared.

I hope you liked this introduction to the scriptutil.ffindgrep() function and will find the latter to be a worthy addition to your Python toolchest.

Outlook

In the next article I will present the scriptutil.freplace() function that not only helps you find files but also allows you to search and replace strings inside these.