What’s the point of os.path.commonprefix?

Monday 22 March 2010This is almost 15 years old. Be careful.

Most of the Python standard library is great, providing functions and classes that do their jobs well, often even before you knew you needed the job done (urlsafe_b64encode FTW!)

Which makes my disappointment with os.path.commonprefix all the stronger. This function is worse than useless, it’s misleading. Although it’s in the os.path module, it knows nothing about paths, working instead character-by-character:

>>> os.path.commonprefix(['/home/ned/cog', '/home/ned/coverage'])
'/home/ned/co'      # That's not an actual path!

The docs helpfully include the warning:

Note that this may return invalid paths because it works a character at a time.

But it should say:

This function is in the wrong place, and has nothing to do with paths, don’t use it if you are interested in file paths!

I accepted a patch to coverage.py which used this function, and it looked good. But eventually I turned up cases it got wrong, and had to re-discover what people seem to have understood this for at least eight years. *Sigh*

Comments

[gravatar]
ugly one-liner to get the job done (if I understood correctly)
>>> p1 = "/home/ned/cog"
>>> p2 = "/home/ned/coverage"
>>> "/".join([p for i, p in enumerate(p1.split("/")) if p == p2.split("/")[i]])

'/home/ned'
[gravatar]
oh nevermind, that was silly and just happens to work in this case :)
[gravatar]
I notice that it works on lists as well as strings. It's almost like they wrote an auxiliary function to what they actually needed and stopped there.
[gravatar]
The Apache Alias command works the same way, I loathe it as well ;)
[gravatar]
Oddly enough, I actually needed exactly this functionality two years, and decided to look in os.path just in case. So while it may be annoying and broken to other people, for me it was "Batteries included" at just the right time.
[gravatar]
The following works for your example, but testing is limited, and any true replacement should not have to rely on the current broken implementation:
>>> import os
>>> def commonprefix(*args):
	return os.path.commonprefix(*args).rpartition(os.path.sep)[0]

>>> commonprefix(['/home/ned/cog', '/home/ned/coverage'])
'/home/ned'
[gravatar]
@Paddy3118: thanks for the implementation. As it happens, the extra annoyance (not stdlib's fault) was that the patch shouldn't have even been trying to find a common prefix!
[gravatar]
Wouldn't
os.path.dirname(os.path.commonprefix(
    ['/home/ned/cog', '/home/ned/coverage']))
be better?
[gravatar]
Here is a version that does not rely on the faulty version. It compares the directories by whole directory names, level-by-level.
>>> from os.path import sep
>>> from itertools import takewhile
>>> def allnamesequal(name):
	return all(n==name[0] for n in name[1:])

>>> def commonpaths(paths):
	bydirectorylevels = zip(*[p.split(sep) for p in paths])
	return sep.join(list(zip(*takewhile(allnamesequal, bydirectorylevels)))[0])

>>> paths = ['/home/ned/cog', '/home/ned/coverage']
>>> commonpaths(paths)
'/home/ned'
[gravatar]
Yes, commonprefix is not really an os.path thing as it's simply longest prefix. Something like commonpath() or commonroot() would be better (and probably would be accepted into the stdlib).
[gravatar]
Roger Lipscombe 1:16 PM on 29 Mar 2010
I've just found exactly the same problem with a GetRelativePath method that I wrote in C# a while back.

It does exactly what I needed at the time, but breaks horribly in almost every other case. I keep meaning to go back and write it properly, but...

Of course, I've not released it as part of a standard library.
[gravatar]
Please, if you ever try to implement a "correct" version, take os.altsep into account -- at least by calling normpath().
[gravatar]
Fixed in Python 3.5 with addition of os.path.commonpath

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.