Too dynamic?

Sunday 22 October 2006This is more than 18 years old. Be careful.

Here’s a debate that arose recently, about the extent to which dynamic typing can be taken, and whether it is too far. We have a function that takes a list of ids, but it can also be used in a way that gets the ids from another well-known place. It was originally coded like this:

def insert_ids(ids):
    """ Insert the ids, or the global ids if ids is 'global'.
    """
    if ids == 'global':
        ids = get_global_ids()
    for id in ids:
        # blah blah blah

# Now we can insert ids two different ways:
insert_ids([1,2,17,23])
insert_ids('global')

ids is an argument that can either be a list of ids, or the string ‘global’, meaning go off and get a list of ids from somewhere else. But this use of the same argument as either a string or a list felt funny, so we changed it to this:

def insert_ids(ids=None, use_global=False):
    """ Insert the ids, or the global ids if use_global is True.
    """
    if use_global:
        ids = get_global_ids()
    for id in ids:
        # blah blah blah

# Now we can insert ids two different ways:
insert_ids([1,2,17,23])
insert_ids(use_global=True)

But now we have two arguments, both of which have to be defaultable, making it possible to call the function with no arguments, which is not a valid form of the function. Am I being too squeamish about the dynamic nature of the first form? Although Python doesn’t mind, it feels strange to me for a variable to sometimes be a string and sometimes be a list. Is this pythonic? Or just a confusing abuse of power?

Comments

[gravatar]
With Erlang, the first form is encouraged, but then Erlang has pattern matching and it is idiomatic to allow Erlang to match the proper function to the argument types at runtime. The function argument signatures help to self document what arguments are valid for a given function.

But the first form in Python burys the argument type handling in the body of the function. So for that reason alone I might say the second form is superiour, because it more effectively expresses the valid input arguments for the function, which is something that needs to be communicated regardless.
[gravatar]
I get the same strange feeling whenever I create these sorts of functions, and I don't like it. I usually end up writing two seperate functions which take two different arguments. This makes it clear how each function is to be used, and removes the strange feeling.

Lately, I've been looking at multimethods as an alternative way of implementing polymorphic behavior.


-Sw.
[gravatar]
why not,
def insert_ids(ids=None):
if ids is None: ids = get_global_ids()
...

if you later have globals2 (another special case), then that won't work.

in common lisp, this idiom is kinda used:

(defun test (&key a b)
;; one should be non-NIL
(when (and (null a) (null b)) (error .....))
[gravatar]
To me, it seems that code using the functions was more clear if there were two methods: insert_ids(ids) and insert_global_ids(). The latter of course delegating to the former.
[gravatar]
I thought about the insert_global_ids() idea, but the real-life function is much more complex: it actually has seven arguments, and fetching the ids is in the middle of the function, not the top. In addition, if we have to allow for defaulting of more than one argument, then there's an explosion of combinations, each of which must have a special function.

I don't know that it's the wrong way to go, but it's got its own downsides...
[gravatar]
I like the first method better in an intuitive way, but I agree with the other posters who say that it'd be neater as a multimethod.

And re: the 7 argument thing, it sounds like these parameters are almost worthy of being refactored out into a class, instead of writing "an explosion of combinations". If it's so complex, wouldn't it be the thing to do to encapsulate the complex behavior?
[gravatar]
I'd say, depending on what the real version of the function is, either:
* do as sri said above, or
* have ids be the only arg and interpret an empty ids as "use the global ids"
Good luck. I'm interested as to what you end up deciding.
[gravatar]
I tend to be a static kind of person, and the first alternative really bugs me for that reason. It seems to open up a lot of room for careless errors that trickle down into that code, not the least of which is mispelling 'global'.

In addition, using a global is something that I think should probably be an either/or proposition. Either the function always uses globals in some respect and always documents it, or it delegates that to the caller in an unambiguous way.

I would create a function that does not have the argument be default-able, and if the caller of that function requires a global argument, then the caller should themselves call the function like:

insert_ids(get_global_ids())

This way it is obvious, at the calling point, that global resources are being used here. Most of the time, it is bad to add verbosity to the point in the program where a function is invoked. But in the case of the use of global variables, I think that this verbosity is justified because of the potential pitfalls of not seeing (or not having a static analyzer see) where the uses are.

If a similar situation arises independently of global considerations, then I would favor an alternative where either a separate function (above) would specify that a special default value is used. Or, if there were a more complex set of cases, then to pass a set of flags to a configuration object which would automatically set up defaults according the flags and where the user of the configuration object would set the rest of the fields manually.
[gravatar]
To expand on what Jonathon Duerig said: misspelling 'globals' would be pretty bad. If I passed in 'global', your loop would see ['g','l','o','b','a','l'], since strings are iterable. That's...confusing.

I would also go with insert_ids(get_global_ids()). I don't see this as a case of dynamic vs. static. I see it as introducing extra code paths in the name of questionable convenience. You talked about multiple defaults needing an explosion of combinations - well, in a single function, one might argue that the explosion is still there for testing and understanding. Certainly the cyclomatic complexity is higher, and I believe minimizing it is an important goal for reliable, testable software.

I'm inclined to think that a seven-argument function for which you're even tempted to add these sorts of alternatives is too complex altogether. But this discussion is too abstract for me to say so confidently.
[gravatar]
I think the following pattern is more verbose, more maintainable and more protected of typos:

GLOBAL = object()

def insert_ids(ids=GLOBAL):
if ids is GLOBAL:
ids = get_global_ids()
...
[gravatar]
The standard Python pattern for this is to use None, not a magic string.

Using separate methods may be a better solution for some cases, but that depends on how this API is used, not what it does on the inside. Good API design is about usage patterns, not implementation details.
[gravatar]
Although I would prefer Jonathon's suggestion (or simply use None if there are no other special cases, as sri said), here is another possibility: pass the function as an argument:

insert_ids(get_global_ids)

Kind of like re.sub, where you can pass either a string or a function as the replacement argument.
[gravatar]
I'd expect None to mean "get the global values". And to be honest, I'd have been happy enough with having to use insert_ids(get_global_ids()) on the basis of explicit is better than implicit. Using a magic string seems wrong to me.

Of course, you've said that the real function is more complex, and that makes all the difference...
[gravatar]
Yeah... I wouldn't do the "global" string... I like the None and Paul's recent suggestion there of insert_ids(get_global_ids())... both of those seem solid ways to go. Otherwise, somebody will likely break the code quite easily with a bad call or not understanding what should go in the field.

The time where I think going more dynamic is cool is something where you're wanting to use parameter in a similar way regardless of the type. For instance, say you have a function or method that uses a list for something... it would be good if it could also intelligently take a dictionary or a list of lists and handle it appropriately...
[gravatar]
I don't like either.

insert_ids should take ANY type as an argument and insert it. String, list, dictionary. If argument is a arbitary string "globals" the first form won't work, but you might want to insert that exact string.

Now you might argue that IDs are a type of some sort, and a string is not valid. Maybe that is true today, but will it be tommorow? Maybe not, better not risk it - those who are assigned to make strings a valid id type will have enough work without having to refactor any place where "global" is passed in as a string.

insert_ids(True) should be read as either "Turn on inserting IDs", or Insert the ID True - if that makes sense. No programmer would read it as insert global IDs. There should be a different function insert_global_ids, which makes it clear when you read the code what is going to happen. Readability is good.

I would give careful consideration to the other comment that you might really want a class here, not a function. Without knowing your problem we cannot say for sure, but it sounds reasonably.
[gravatar]
how about:

def insert_ids(ids):
...

def insert_global_ids():
ids = get_global_ids()
return insert_ids(ids)

def insert_other_flag_ids():
ids = get_other_flag_ids()
return insert_ids(ids)
[gravatar]
as some people have posted there is no need for acrobatics

def insert_ids(ids):
""" Insert the ids, or the global ids if ids is 'global'.
"""
for id in ids:
# blah blah blah

# Now we can insert ids two different ways:
insert_ids([1,2,17,23])
insert_ids(get_global_ids())

is as functional and is errorphone
[gravatar]
If you have a special case, there are a few steps.

First, do you really need a special case, or are you just being paranoid about type safety? Let the caller take care of whether they mean what they say, and make your function do *one* job clearly and simply. (This supports the 'insert_ids(get_global_ids())' idea earlier.)

Second, do you *really* need a special case, or are you making your function too complex? Be very suspicious of functions that are written to do two different things depending on their input, and split them so that both are simple and the caller can be explicit about what they want. This doesn't preclude factoring out the code that's common to both of them, of course.

Third, if you actually need a special case, can it be None? This is the idiomatic Python "sentinel value", and it looks like the code posted by 'sri' above. Note that if you're squeamish about using None, but don't have a specific reason not to use it, use it; other programmers will thank you for following convention.

Fourth, if you have decided that a magic sentinel value is called for but None is already taken for some other purpose, don't use a string. Use a unique do-nothing object, defined at the module level so callers can easily get at it, like 'Dmitry Vasiliev' showed. You won't accidentally use it, because it's defined only in one place (you're comparing by 'is', remember) and it's not used for anything except indicating the special case.

Fifth, there is no fifth. If you've come to the end and think it's too complex, it probably is. Start at the top again.
[gravatar]
I do something similar:

class Resource(object):

....def __init__(self, node):
........attributes = self.ATTRIBUTES
........if not isinstance(attributes, (tuple, list)):
............attributes = (attributes,)
............for attribute in attributes:
................# do stuff

class TranslateTransform(Resource):
....ATTRIBUTES = "translate"

class RotateTransform(Resource):
....ATTRIBUTES = "axis", "angle"

As you can see, ATTRIBUTES can be both a string or a sequence.

While this goes against most stuff from OOP books, I feel it is nicer.
[gravatar]
Probably does not matter, but what I would do is either use None instead of 'global' for the 1 case different or for multiple dispatch I would pass in a callable and do if callable(ids): ids = ids()

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.