Friday 1 February 2013 — This is 12 years old. Be careful.
The Rails community has had a few high-profile security issues this week. They are well-summarized, with an alarming list of what follow-ons to expect, by Patrick McKenzie: What the Rails Security Issue Means for Your Startup.
tl;dr:
- Ruby’s YAML parser will execute arbitrary Ruby code,
- YAML is parsed all over the place in Rails, including for all JSON input,
- Pretty much every Rails app is going to be compromised soon.
The Python community is in a slightly better position. True, we have pickle in the standard library, which has exactly the same problem, but it’s rare to find applications that accept pickles from untrusted sources.
Don’t ever unpickle data you don’t trust!
The 3rd-party YAML parser PyYAML has the same issue as Ruby’s YAML parser. By default, it will let you create arbitrary Python objects, which means it can run arbitrary Python code. YAML isn’t nearly as pervasive in the Python world, and we don’t parse JSON with the YAML parser usually, but this can still create security holes.
PyYAML has a .load() method and a .safe_load() method. Why do serialization implementers do this? If you must extend the format with dangerous features, provide them in the non-obvious method. Provide a .load() method and a .dangerous_load() method instead. At least that way people would have to decide to do the dangerous thing. I would advocate for PyYAML to make this change now, who cares if backward compatibility breaks? Most people using .load() never intended to deserialize arbitrary Python objects anyway, so they’ll never notice.
If you use the PyYAML library in your code, check now that you are using the .safe_load() method.
If you want automatic serialization of your user-defined classes, take a look at Cerealizer, which works similarly to pickle, but is built to be secure from the start. I’ve never used it, but it looks promising.
BTW, this whole circus reminded me of Allen Short’s excellent lightning talk from PyCon 2010: Big Brother’s Design Rules (skip to 17:30). To summarize Allen’s pithy maxims:
- War is Peace: assume you are at war, all input is an attack, and then you can be at peace.
- Slavery is Freedom: the more you constrain your code’s behavior, the more freedom you have to act. The smaller your interface, the smaller your attack surface.
- Ignorance is Strength: the less your code knows about, the fewer things it can break. This is the principle of least authority.
Allen in particular mentions that adding “conveniences” to your interface can make your life harder later on. In Ruby’s case, there were two unneeded conveniences that combined to make things really bad: parse JSON with the YAML parser, and let the YAML parser construct arbitrary Ruby objects. Neither of these is actually needed by 99.999% of programs reading JSON, but now all of them are compromisable.
Think hard about what your program does. Stay safe.
Comments
Posts like Ned's acknowledging issues like this and taking them seriously are more helpful and do more credit to the Python community than just "closing the issue" would. Every language has problems, not just Ruby (and certainly not just Rails). If there's a way for the Python community to distinguish itself here it is by taking security seriously and getting out ahead of the issues instead of just getting defensive.
The core Python team tries hard to promote a culture of "use as much magic as you need, but no more" (often paraphrased as "magic is evil", and included in the Zen of Python in various guises like "explicit is better than implicit", "simple is better than complex", "complex is better than complicated" and "if the implementation is hard to explain, it's a bad idea"). However, it's always going to be tempting to make the powerful and flexible option the default, and the more restrictive option the exception.
As an example that was fixed in Python 3: Python 2 has "input()" which implicitly calls "eval()" on user supplied data. The safer alternative, which allows the use of more restrictive parsing by always returning a string, is called "raw_input()". In Python 3, the input() builtin itself has been fixed to behave like Python 2's raw_input()
However, even in Python 3, the builtin eval() is still dangerous to use on user-supplied data, as it can execute *any* Python expression. For obscure technical reasons, the safer-but-more-limited alternative, "ast.literal_eval()", isn't even a builtin the way raw_input() was.
Only in Python 3.3 did we start shipping a comparison operation suitable for security sensitive operations (hmac.compare_digest), and there are still no suitable primitives for password hashing in the standard library (although "passlib" is just a download away on PyPI).
No Pythonista should ever feel smug about security woes in another language or runtime, whether that's Java or Ruby or something else. We have a track record of promoting "safe by default" behaviour, but our record certainly isn't perfect, and we'll almost certainly have more issues in the future. Standard library behaviours that are safe within the confines of a single system (like sharing pickled objects through a pipe) become unsafe when spanning multiple systems (like sharing pickled objects without cryptographic signatures across a network socket), and we're relying on other developers to understand that. Heck, the Rails vulnerability is overshadowing a recent MoinMoin exploit which was used to take out both Debian's main wiki and the Python wiki on python.org.
Looking specifically at the case of the recent Rails problems, even apps written in Python may run into trouble if a related Rails app, or an unrelated Rails app on the same network, falls to an attacker. Attackers don't stop just with the first machine compromised - every compromised machine becomes a platform for launching additional attacks, often with additional data about or privileged access to subsequent target systems.
The design space available for programming languages is enormous, and we collectively still know very little about how to write large scale software sensibly. When other languages and software are attacked, it is important to reflect on it and see what lessons can be learned for our own tools (as Ned has done here), rather than arrogantly assuming ourselves to be immune from the same kinds of error.
I like that. Nicely put.
There aren't any problems with Python.
Rails is not comparable to Python.
I don't see any amazing insight in this post.
There is only one thing that at the present time irks me in python and it is package management. I would love to see http://www.python.org/dev/peps/pep-0381/ implemented as a starting point and maybe even parts of the technical spec of TUF found at https://www.updateframework.com/ integrated into the code (perhaps in the 'pip' module).
BTW, if anyone is interested in a 'dumb search' for 'potentially' unsafe module/module function calls in their python code, I maintain a small grep script which can be found at https://github.com/d1b/python-check-script/blob/master/python_hunt.sh
* don't feed the trolls *
My half-baked plan of action would be use Github's code search to dig up some real-world examples of unsafe PyYAML usage, and petition the PyYAML author to
- Increase the major version
- Rename load() to unsafe_load()
- rename safe_load to load(), but keep safe_load() as an alias
This would break the API for some users, but I suspect many people are using YAML as a "prettier JSON", and should really be using safe_load anyway.
Celery by default uses pickle for sending objects through the broker. You can switch it to json, but then you need to implement json methods for any complex objects you are sending. All of my objects in celery are only ids and strings but I should go make absolutely sure.
Popular packages should get community security reviews. Maybe eyeballs are good enough.
Pip needs to be checking PGP keys and we need to all be signing our distributions when we push. That's serious and we should get on that. Gather the most paranoid dudes and fortify the castle.
https://github.com/pypa/pip/issues/425
The need for caution when using *native* serialisation seems obvious enough to go without saying to myself, but perhaps a little more warning should be more heavily peppered in the pickle/unpickle documentation.
> exactly the same problem, but it's rare to find applications
> that accept pickles from untrusted sources.
There was known issue on this.
http://blog.nelhage.com/2011/03/exploiting-pickle/
Ruby had a .load() method and... well, that's it. Pretty much every Ruby application that parsed untrusted YAML did so unsafely because there was no trivial way to parse it the correct, safe way, and the parser developers had been dragging their feet on adding one. That's a fairly fundamental difference.
http://www.smartfile.com/blog/python-pickle-security-problems-and-solutions/
Add a comment: