Thursday 18 September 2008

Attribute access in format strings in Python 3.0

Here is another problem with securing Python 3.0: PEP 3101 has extended format strings so that they contain an attribute access syntax. This makes the format() method on strings too powerful. It exposes unrestricted use of getattr, so it can be used to read private attributes.

For example, the expression "{0._foo}".format(x) is equivalent to str(x._foo).

CapPython could work around this, but it would not be simple. It could block the "format" attribute (that is, treat it as a private attribute), although that is jarring because this word does not even look special. Lots of code will want to use format(), so we would have to rewrite this to a function call that interprets format strings safely. Having to do rewriting increases complexity. And if our safe format string interpreter is written in Python, it will be slower than the built-in version.

My recommendation would be to take out the getattr-in-format-strings feature before Python 3.0 is released. Once the release has happened it would be much easier to add such a feature than to take it out.

It is a real shame that Python 3 has not adopted E's quasi-literals feature, which has been around for a while. Not only are quasi-literals more secure than PEP 3101's format strings, they are more general (because they allow any kind of subexpression) and could be faster (because more can be done at compile time).

1 comment:

Unknown said...

Changing the format string syntax now is not feasible as we just entered the release candidate stage with Python 3.0. Right now your best bet would be to argue for their removal in 3.1 somehow (but I wouldn't count on it).

And as for not using E's quasi-literals, no one proposed them so they were not considered.