November 18, 2010

Free Software license onion

There are tons of Free Software licenses out there, and it can be confusing choosing one for your Lisp project. I'm writing this guide to clear up some confusion, but be aware that I'm not a lawyer and you probably should consult one if you are going to take advantage of this advice for something that you hope to commercialize.

A good strategy for choosing a license is to consider how the project would fit in a software stack - is it a general-purpose utility library? A library to interface to specific services or hardware? A web server that can be used to build applications? A content management system that you one day hope to commercialize? These projects all have different potential uses, but it would be nice if their licensing terms reflected those uses and allowed the content management system to use the web server, allowed the web server to use the service interface library, and allowed the service interface library to use utilities from the general-purpose library.

I think general-purpose utility libraries should be public domain (consider using the unlicense, which is a copyright waiver with a warranty discalimer). One thing people often want to do is copy-paste one or two functions out of them into random projects. Other good things to put into the public domain include benchmark code and test suites (you can release just those parts of your project as public domain).

Next up on the hierarchy are permissive licenses. There are dozens of these around, however I think only those that are LLGPLv2.1 compatible should be used for Free Software written in Lisp (Wikipedia has a chart summarizing Free Software license compatibilities).

There are a few of these (see the GNU license list whether a particular license is GPL-compatible or not), but generally most of them are like the BSD license. I recommend using the ISC license, which is really concise and is used by OpenBSD. Good examples of projects that can be licensed under a permissive license are parsers for data formats and interfaces to C libraries or web services. Permissive-licensed software can include public domain code and code from most other permissive-licensed software.

Projects you do not want to see forked into a closed-source product without giving back to community should be licensed under the LLGPLv2.1 (version 2.1 specifically, as LGPLv3 code cannot be used in a GPLv2 project). Why the LLGPL and not the LGPL or the GPL? The GPL makes it impossible to use a Lisp library as part of a closed-source product (even if you only use it as a library and make no modifications), and the wording of the LGPL does likewise because the "linking clause" basically presumes you're using C.

LLGPL software can incorporate other LLGPL code, public domain code, and LLGPL-compatible permissive license code, but for example LLGPL code can't be put into an ISC-licensed project (the whole project would have to be re-licensed under the LLGPL).

I think it's pretty obvious that you shouldn't license your Lisp library under the GPL if you want other people to actually use it, but how to decide between a permissive license and the LLGPL? I think that aside from practical considerations, many times it comes down to a moral choice. LLGPL forces sharing reciprocity on others. I believe copyright should be abolished or severely reformed, and until those changes start taking place, the LLGPL can be used to similar effects through the impact of works on which you own the copyright (the real pirates aren't the people downloading mp3s, they're the people violating the GPL).

I think the most interesting licensing situation is when you want to develop commercial software, but allow proprietary use and extension only by certain parties (hopefully ones which will pay you lots of money). The general strategy for this is to dual-license your code under the GPL and a commercial license. The most prominent example of a project using this strategy is MySQL.

One implication of this is that in order to make it into your official repository to be dual-licensed, any patches must have their copyright assigned to you/your company by their contributors.

Which version of the GPL to choose for dual-licensing? I think either v2 or v3 is fine (one important thing v3 includes is the anti-tivoization clause, which prevents closed hardware platforms).

One thing the GPL doesn't cover is running proprietary modifications of GPLed software as a service without any distribution going on (for example, Google is rumored to have Linux kernel patches running that haven't been released). The AGPL addresses this issue. I don't know of any software dual-licensed under the AGPL, but I think it can be a promising strategy for a variety of projects.

November 16, 2010

Character encoding is about algorithms, not datastructures

One thing you might be aware of is that both SBCL and Clozure represent characters using 4 bytes. There's been significant discussion about this already, but I hope I can offer more insight into how you can apply this to your Lisp applications.

First, the one thing most people seem to agree on is that UTF-16 is evil (it should be noted that both CLISP and CMUCL use UTF-16 as their internal character representation).

The important thing about UTF-32 vs. UTF-8 and UTF-16 is that it is not primarily a question of string size, but of algorithms.

Variable-length encodings work perfectly fine for stream algorithms. But string algorithms are written on the assumption of constant-time random access and being able to set parts of a string to certain values without consing. These assumptions can't be satisfied when variable-length encodings are used, and most string algorithms would not run with any level of acceptable performance.

What about immutable strings? Random access is still not constant-time for variable-length encodings, and all the mutator operations are gone. In effect most immutable string algorithms actually end up being stream algorithms that cons a lot.

Chances are, you're already using stream algorithms on your strings even if you're not aware of it. Any kind of search over a string really treats that string as a stream. If you're doing string concatenation, you're really treating your strings as though they were immutable - consider using with-output-to-string to cut down on consing and to simplify your code.

One thing that is special about UTF-8 is that it is the de-facto character encoding standard of the web. When you're reading UTF-8 into a Lisp string, what you're really doing is decoding and copying each character.

Most web application patterns are based around searching for and extracting some string from the HTTP request, and either using that string as a retrieval key in a database or splicing it together with some other strings to form the reply. All these actions can be modeled in terms of streams.

One of the things that makes John Fremlin's tpd2 fast is that it dispenses with the step of decoding and copying the incoming UTF-8 data into a Lisp string (this is also what antiweb does). Using some compiler macros and the cl-irregsexp library, all the searching and templating is done on byte arrays that hold the UTF-8 encoded data (Lisp string literals are converted to UTF-8 byte arrays by compiler macros). The result is a UTF-8 byte array that can be sent directly back to the web browser, bypassing another character encoding and copying step.

I read somewhere that the Exokernel operating system permitted extending this concept further down the software stack by allowing applications to send back pre-formatted TCP/IP packets, although I don't know if that's actually possible or how much of a speedup it would give.

In addition to skipping the overhead of copying and re-encoding characters, working on UTF-8 byte sequences directly means you can use algorithms that depend on working with a small alphabet set to achieve greater performance (an example of this is the Boyer–Moore–Horspool string searching algorithm).

November 7, 2010

uri-template, Eager Future и Руби, и ещё немного бреда о Лисповой операционной системе

Выставил uri-template и Eager Future на github:

https://github.com/vsedach/uri-template
https://github.com/vsedach/Eager-Future

uri-template теперь использует named-readtables, которое дает возможность использовать readtables и reader macros аналогично packages. Рекоммендую named-readtables всем кто использует read-macros, и даже всем кто их не использует из за readtable-case :invert (самый перспективный метод иметь регистры типа верблюжих в символах).

План перевести uri-template на LLGPL (была лицензированной под BSD), а Eager Future переписать как совершенно новый проект с довольно уникальными свойствами среди конкурентных библиотек (тоже под LLGPL, но сначала надо додуматся как перебороть SBCL, там что-то вроде бага со слабыми ссылками - подробности (англ)).

Джастин Грант показал как можно реализовать компилятор с Ruby на Лисп (англ). Я наконец сошел с ума и тоже хочу сделать компилятор на Лисп, только из C. Зачем? Что бы взять NetBSD, скомпилировать его драйвера на Лисп, и получить Лисповую операционную систему (я же сказал я сошел с ума). Подробности на Hacker News (англ). Проект вот здесь, для начало взял Zeta-C, C компилятор для Лисп-машин, но уже видно что почти всё придётся переписывать. Для начала, может кто нибудь знает, существует ли рабочий Лисповый парсер C (в Zeta-C противная дрянь генерирована yacc-ом времен СССР и переведенная в Zetalisp)?