My thoughts on programming and technology
After the fact, I decided it was worth a few hours to actually write a script to convert Wordpress XML dumps to blogofile posts. It has a GitHub repo where you can look at the code. It may eventually end up in the blogofile repo too.
As for the technical factors, I chose to write output as similar as possible to wordpress2blogofile as possible. The post naming conventions and the sequence of YAML metadata mimic what wordpress2xml does. I chose to use lxml for HTML parsing for a few reasons:
- It's a dependency for blogofile, so it's likely that the script will work for the user with no extra effort
Anyone who's in the know about Python knows quite a bit about what Guido likes about Python and what he doesn't. Something that he's been very clear about for a number of years is that he doesn't like some of the functional programming constructs in Python. I'm talking about lambda expressions,
reduce(). He said in 2005 (when he was first envisioning Python 3):
About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite the PR value, I think these features should be cut from Python 3000.
Ultimately, lambda, map, and filter stayed as builtins, and reduce got moved to functools. But why all the fuss? You might want to read the BDFL's article on the matter. For those who want a bit of a different perspective, read on.
Arguably, the big problem with functional programming constructs like lambdas and map is that they are less readable than their counterparts. Most would argue that
[x for x in y if x > 2] is more readable than
filter(lambda x: x > 2, y]. I won't argue with that. But that's certainly an extreme example, isn't it? It just so happens that this example represents a case that works out very nicely in a list comprehension. In the spirit of due process, however, this and other affronts to readability serve as sufficient evidence to charge functional constructs in Python with a quite heinous crime in the Python userland: obfuscation.
On the charges against lambdas
First, lambda expressions. This is a feature that should not be used to produce any substantial routine. I think most of us can agree that this is evil:
x = lambda x, y: functioncall(x, 10) + functioncall(y, 5)
This is bad for a number of reasons:
- No way to write a docstring
- No normal function declaration -- hangs up documentation tools, makes it hard to know that function x exists by reading the source
- One-liner (in the bad way) -- there's quite a bit going on for that one line of code
Rather, it should always be written in its normal form, like so:
def x(a, b): '''I provide useful info about the function for IDEs and docs generators. I also am good for those reading source.''' return functioncall(a, 10) + functioncall(b, 5)
Wasn't that better? Clearly, lambda must be guilty of the crime of obfuscation.
Now to sentencing: what should we do to lambda? Many would suggest to get rid of it. I say keep it. It might be obscure and unreadable in some cases, but it's a lifesaver in cases where you need to pass a very simple callback to a function. So, in the case of
lambda v. Python, the judge finds the defendant guilty and orders him to be placed in house arrest, only to leave when used as a simple callback.
On the charges against map and filter
map and filter face similar charges of obfuscation. Many would argue that powerful list comprehensions obviate the need for map and filter:
map(function, seq) function(x) for x in seq] # more clear, relies on more "standard" language constructs filter(function, seq) [x for x in seq if function(x)] # likewise
Personally, I don't find the map and filter implementations to be much less readable than their list comprehension counterparts. However, I also happen to like functional programming and I know a good bit about Python. Most people would have a hard time reading this, especially as complexity increases.
Along the lines of "There should be one, and only one, obvious way to do it", map and filter are guilty of crowding Python with new solutions to problems that have already been solved. In the court of Python, this is an offense punishable by death. Because of this, in the case
map and filter v. Python, the judge finds the defendants guilty and sentences them to death. The defendants should be promptly removed from the standard library, and should only leave their places on death row for code golf competitions.
reduce is a fold operation, if you're familiar with other functional languages you should get this. It's a recursive concept and a common idiom in functional programming that many languages encapsulated in a higher order function (as they should have). However, Python is not a very recursive language. To demonstrate to the court why reduce is guilty, take a look at this example.
x = reduce(lambda x, y: x * y, seq) # what is this i don't even # equivalent in human speak product = 1 for x in seq: product *= seq # oh, we're just finding the product of a sequence
GUILTY! In this case reduce only serves to dramatically reduce(geddit) the amount of code written at the expense of any clarity. This is a simple example and is, admittedly, quite understandable and simple for anyone who has used reduce (or fold in another language). However, as code grows in complexity, loops are almost always more clear. So, in the case of
reduce v. Python, the judge finds the defendant not only guilty of obfuscation and lack of necessity, but also guilty of pure evil. For that, reduce meets the gallows (as it essentially has in Python 3).
Syllabus (AKA tl;dr)
lambda is only acceptable in a few restricted situations.
reduce need to leave the Python language because they are unnecessary and they quickly grow unwieldy and unreadable as their complexity increases.
I accept functional programming as a good idea in general (what would we do without closures and first-class functions, after all), but some parts of Python are relics of Lisp hackers before us and don't really fit in with the language philosophy. Because of this, I'll take Guido's side: I don't like most of the functional programming stuff mixed into Python's global namespace and syntax. I don't mind having a module for it (
functools is useful, and when you import it it's a clear statement of intent), in fact, I happen to like that module. But in general, higher-order functions and lambdas need to be relegated to a lower place in the language.
This may very well be the last magic methods blog post, and it's fitting that I'll be addressing a method that I initially overlooked.
Python would be very hard to use without mixed-mode arithmetic: imagine what Python would be like if a type-conversion was necessary to add an integer to a float. In order to make our classes behave the same way, we can define a
Method to implement mixed mode arithmetic. Takes arguments
other. Should return
None if type conversion is impossible. Otherwise, it should return a pair (2-tuple) of self and other, manipulated to have the same type.
That's all for now; it seems that I'll have to find another weekly series to tackle. Keep watching the magic methods guide here.
We're nearing the end of the magic methods blog post series (and the beginning of the magic methods guide, which will be better organized, better explained, better demonstrated, and all in one place!). Thus, I'm running out of magic methods to work with. Here we head back to a few magic methods that I ignored (or forgot):
Takes argument self. It returns a unicode string representation of the instance.
Takes argument self and returns a boolean value,
False. This method gets called when the
bool() builtin is called on an instance, e.g.
if some_instance: #do something
I've uploaded a draft of the first few parts of the magic method guide. You can look at it here.
Bear in mind, it's not nearly finished. I figured I'd post it in case someone wants to critique it, find some errors (I'm sure there will be some), or just follow the progress. Watch the version number as I add new content (I'm incrementing it by 1 each time I make a substantial addition, it'll be 1.0 by the time it's "content-complete").
In Python 2.5, a new keyword was introduced in Python along with a new method for code reuse, the
with statement. The concept of context managers was hardly new in Python (it was implemented before as a part of the library, I believe), but not until PEP 343 was accepted did it achieve status as a first class language construct. Usage for the
with statement is simple:
with A() as a: # do something
You're probably wondering what the point of all of this is. It might look innocuous at first, but there's some magic going on behind the scenes (and, as always with Python, you can take control of that magic for yourself).
Defines what the context manager (the
with statement) should do at the beginning of the block. It takes the argument
self. Whatever it returns get bound to the target in the
with statement (the name after
as), so you could in fact use
__enter__ to create a completely new object altogether (if you'd like).
Gets loaded at the start of the block and executed after the block. It can commonly be used to handle exceptions, perform cleanup (closing a file or connection), or do something that is always done immediately after we're finished with an object. Unlike
__exit__ takes several arguments,
traceback. If there's no exception, the last 3 arguments will be
None. Otherwise, you can either choose to handle the exception or let it get handled by the user; if you want to handle it, make sure
True after all is said and done. Otherwise, let the exception happen.
Sorry for the late post, but the past week has been a bit hectic, with Christmas and all. Today, we have two magic methods:
__subclasscheck__: they allow us to define custom behavior for reflection.
isinstance(instance, class). Takes arguments
issubclass(subclass, class). Takes arguments
I've been dreading doing this MMM for a while, but it's worth it. Python magic methods can do just about anything; make sequences that behave like language constructs, make descriptors, even make comparisons using operators like
==. Most of you have already figured out that there's a way to make classes defined by the programmer feel and work like basic numeric types. It just so happens that there's about 40 magic methods for this. Here we go:
|self, other||Emulates addition(|
|self, other||Emulates subtraction (|
|self, other||Emulates multiplication (|
|self, other||Emulates integer division (|
|self, other||Emulates division (|
|self, other||Emulates division (|
|self, other||Emulates modulo (|
|self, other||Emulates long division (|
|self, other||Emulates exponent (|
|self, other||Emulates left bitwise shift (|
|self, other||Emulates right bitwise shift (|
|self, other||Emulates bitwise and (|
|self, other||Emulates bitwise or (|
|self, other||Emulates bitwise xor (|
That's it for the "normal" operators. But there's more; each one of these has a version for a reflected operand (e.g.
x - my_class, where the primary operand is
my_instance). These only get called when
x does not support the attempted operation and
my_instance are of different types.
|self, other||Emulates reflected addition(|
|self, other||Emulates reflected subtraction (|
|self, other||Emulates reflected multiplication (|
|self, other||Emulates reflected integer division (|
|self, other||Emulates reflected division (|
|self, other||Emulates reflected division (|
|self, other||Emulates reflected modulo (|
|self, other||Emulates reflected long division (|
|self, other||Emulates reflected exponent (|
|self, other||Emulates reflected left bitwise shift (|
|self, other||Emulates reflected right bitwise shift (|
|self, other||Emulates reflected bitwise and (|
|self, other||Emulates reflected bitwise or (|
|self, other||Emulates reflected bitwise xor (|
And then, there's more. Each normal magic method has a version for augmented assignment (e.g.
my_instance += x).
|self, other||Emulates augmented assignment with addition(|
|self, other||Emulates augmented assignment with subtraction (|
|self, other||Emulates augmented assignment with multiplication (|
|self, other||Emulates augmented assignment with integer division (|
|self, other||Emulates augmented assignment with division (|
|self, other||Emulates augmented assignment with division (|
|self, other||Emulates augmented assignment with modulo (|
|self, other||Emulates augmented assignment with exponent (|
|self, other||Emulates augmented assignment with left bitwise shift (|
|self, other||Emulates augmented assignment with right bitwise shift (|
|self, other||Emulates augmented assignment with bitwise and (|
|self, other||Emulates augmented assignment with bitwise or (|
|self, other||Emulates augmented assignment with bitwise xor (|
You thought it was over? It's not. We still need unary arithmetic operators.
|self||Emulates unary positive(|
|self||Emulates absolute value(|
Now, we have to be able to change types:
|self||Converts to float(|
|self||Converts to int(|
|self||Converts to long(|
|self||Converts to complex(|
|self||Converts to octal(|
|self||Converts to hexadecimal(|
That's basically it. I don't want to implement a class to demonstrate any of this 1.) because it's fairly self explanatory and 2.) that would be one monster of a class.
It came to my attention that I left out a few magic methods when I covered sequences a few months ago. I know it's not Monday, but I don't feel like waiting to tackle this, and I have a free minute now.
__reversed__ defines behavior for when you call
reversed() on your sequence. Takes self.
__contains defines behavior for when we use
in with a custom sequence, e.g.
x in y or
x not in y. It takes arguments self and item to test for membership. Note that this doesn't need to be defined for
in to work with a custom sequence; the default behavior for
x in y is to iterate over y and return True if any of the values in y are x.
That's all for today, happy holidays everyone.
Descriptors are classes which, when accessed through either getting, setting, or deleting, can also alter other objects. Descriptors aren't meant to stand alone; rather, they're meant to be held by an owner class. Descriptors can be useful when building object-oriented databases or classes that have attributes whose values are dependent on each other.
To be a descriptor, a class must have at least one of
__delete__ implemented. Let's take a look at those magic methods:
__get__ defines behavior for when the descriptor's value is retrieved.
__get__ takes three arguments:
self, the instance of the owner class (
instance), and the owner class itself (
__set__ gets called when the value of the descriptor is changed. It takes three arguments also:
self, the instance of the owner class, and the value to set the descriptor to.
__delete__ gets called when the descriptor (or the instance of the owner) is deleted (either through garbage collection or a
del statement), and it takes
self and the instance of the owner class.
If one were defining a class to represent distance (or any other measurement), it might be useful to have multiple units of measurement represented. An example could look like this:
class Meter(object): def __init__(self, value=0.0): self.value = float(value) def __get__(self, instance, owner): return self.value def __set__(self, instance, value): self.value = float(value) class Foot(object): def __get__(self, instance, owner): return instance.meter / .3048 def __set__(self, instance, value): instance.meter = float(value) * 3.2808 class Distance(object): meter = Meter() foot = Foot()
In this case,
Foot are descriptors, and
Distance is the owner class. Descriptors are useful because they can tie values and other aspects of their state to other descriptors, making them an excellent way to simply and beautify code. Without descriptors in this example, the implementation of the conversions would have to be contained in
Distance, which would not only bloat the class definition for
Distance, but make
Meter specific to
Distance. By defining
__delete__, however, we can reuse
Meter in any class that has some attribute for distance. All in all, it's a great way to write reusable, effortless code that can add functionality and beauty to classes that hold them.
Next Page »