Power set generation – a joy of Python

by Ruslan Spivak on June 9, 2011

When I worked on name mangling in SlimIt I needed a function that would generate subsets of a character string in lexicographic order.
For example, if I had a character sequence ‘abc’ I would expect to call my function and get items in the following order: ‘a’, ‘b’, ‘c’, ‘ab’, ‘ac’, ‘bc’, ‘abc’. The produced sequence is called a power set and binary counting is a well known algorithm for generating such power sets .

The idea is that we represent our character sequence as a binary string of n bits, then start counting from zero to 2**n – 1 and if a bit k is set to 1 then we would put k-th element from the character sequence into an output set.
Let’s say we have a sequence ‘abc’, then counting to 2**3 – 1 would produce binary numbers 000, 001, 010, 011, 100, …, which correspond to ”, ‘c’, ‘b’, ‘bc’, ‘a’, …, in the character sequence.

Code for binary counting looks like this (credit for original implementation goes to Tim Peters):

def powerset(s):
    """powerset('abc') -> a b ab c ac bc abc"""
    pairs = [(2**index, char) for index, char in enumerate(s)]
    for index in xrange(1, 2**len(s)):
        yield ''.join(char for mask, char in pairs if mask & index)

And while it generates all necessary subsets they are not in lexicographic order.

So I decided to look at itertools module and check closer some properties of the combinations function available there and see if it could fit for my purposes and – wouldn’t you know it? – at the very bottom of the module’s documentation page I found a powerset recipe that did just what I needed using combinations function.
Here is a little bit adapted version of it:

import itertools

def powerset(s):
    """powerset('abc') -> a b c ab ac bc abc"""
    for chars in itertools.chain.from_iterable(
        itertools.combinations(s, r) for r in range(1, len(s)+1)
        ):
        yield ''.join(chars)

I’ve been using Python for a long time now and still continue to bump into elegant solutions in Python standard library and that’s what I call the joy of Python :)

If you enjoyed this post why not subscribe via email or my RSS feed and get the latest updates immediately. You can also follow me on GitHub or Twitter.

{ 2 comments… read them below or add one }

martin June 10, 2011 at 2:16 AM

Interesting – but lexicographic order would be (a ab abc ac b bc c) [ie. as they appear in a dictionary] – also, it seems that neither solution generates the empty set which is always a member of a power set.

Reply

Ruslan Spivak June 10, 2011 at 6:36 AM

Hi Martin,

1. As for lexicographic order I should’ve been more specific by saying: “… in lexicographic order for subsets of the same length”.
I.e. a < b < c, ab < ac < bc, and so on.
2. Because I didn't need an empty subset I started to count from 1. If you need that empty subset start counting from 0 and modify xrange and range correspondingly in the code snippets.

P.S. The details of the post definitely wouldn’t pass a rigorous academic analysis, but that was not the goal :)

Reply

Speak your mind

Previous post:

Next post: