Friday, March 23, 2012

[Scheme-reports] Just a load of.. well..

J. A. "Biep" Durieux has posted a highly interesting message to Scheme-reports, that should resonate with the "oh my, Scheme is much too big" crowd. (Which does exist, and does have a point.)

I have excerpted some highlights, and recommend to read it all:



Naming the Scheme languages
Therefore I propose "Schemer" for small Scheme, and "Schemest" for large Scheme. The one name clearly indicates its roots, and the other the fact that it is the maximal Scheme. The name "Scheme" would remain as the family name, and the report could be subtitled "Scheme: Schemer, Schemest".

The asymmmetry between multiple inputs and multiple outputs for procedures
This is inherent in Scheme syntax: the receiver of the output of an expression is indicated by the place where this expression occurs - which is necessarily singular: (get-some g1 (provide p1 p2 p3) g2). Repeating the expression would lead to repeated execution: (get-some g1 (provide p1 p2 p3) (provide p1 p2 p3) g2) - apart from being cumbersome.
Prolog has a symmetry in its syntax here that languages with nested expressions simply don't (and I think can't) have. Mark that this is independent of unification: it would be possible to have an unidirectional prolog-like notation, with a group of input variables and a group of output variables, say separated by a hyphen:
(lambda (a b - result rem quot) (quotient+remainder a b - quot rem) (* quot rem - result) (display result -))
Barely possible is "splicing in" multiple results - which would require them to be adjacent. A splicing construct (+ 1 ,@(quotient+remainder 25 4) 6) might do - see Lua for an even more constricting solution.

Tables and environments
This would lead to mutable first-class environments.
Let identifiers be hygienic - translated to locations (or location indicators, as the case may be) at the moment a procedure is defined. So (+ x 1) doesn't change when another x is inserted in a more local environment - or when it is deleted from the original environment. In the latter case the location is not garbage-collected, because this code still refers to it. It is also possible to write (+ (get-location-of 'a) 1), and *that* code is sensitive to environment manipulation.
(There might well be a flag indicating that each mention of an identifier is meant as a call to get-location - some kind of "don't inline" pragma.)
Oh, and environments shouldn't be different from other tables, and should have a way to indicate where to find a location that is not locally available. In that way various schemes of multiple inheritance would become trivially implementable.

Multi-pass compilation
This is like solving a consistency maintenance system (Jon Doyle called that "truth maintenance", some others "reason maintenance"), which is more for constraint satisfaction systems than for Scheme. Single pass with maximal deferment is the Scheme way: variables inside lambdas don't need to be resolved yet, resolution can be deferred until they are called. Without very strong reasons, one should not request different behaviour from syntax transformers. Ease of compilation, or speed of the resulting system are no strong reasons.

Sublanguages
Once there was Lisp/Scheme, which was about lambda. Then a completely different language for macro rewriting was added, and now a (hidden) constraint satisfaction system. I don't like that, it is feature on top of feature. A rewriter can be written in Scheme and then used in macro expansion - great, but let the rewriter then be generally available (because it has other possible uses), and be optional in writing syntax (because there are other ways that sometimes may be better). The same is true for a consistency maintanance or constraint satisfaction language - let them be first-class, explicit languages that one may use or not according to ones liking. That's what libraries are for.

Minimality
Something else is lost here. Scheme was about the right thing, but also about the minimal thing. Scheme provided a basis which was like a semantic assembly language: it provided primitives with which to build. Yes, certain complex systems ease programming - so let's make sure such systems can be written in Scheme, but let's not make them the basis of Scheme! It sounds like old-school anthropology: let's give primitives the right to remain primitive - while making sure they are NOBLE primitives.

Efficiency
Implementability is an issue for Scheme language design, but ease of implementation, or efficiency of implementation ought not to be. (I realise there are fringe issues, such as a theoretical lower limit of 2^2^n in space or time..) Compilability is nice, but never a basis for design decisions.

Macros
The principle of least surprise is already violated with macros. Macros must be bound before use; there is no equivalent to anonymous lambdas. Imagine ((if freevar ) arg ...) - this cannot be in Scheme, because Scheme wants to evaluate the arguments together with the head, whereas the head stipulates whether the arguments need to be evaluated in the first place. I think this evaluation rule reflects a thought error: a seeming symmetry was taken for a real one. Anonymous macros would be a boon, as would be runtime evaluation of expression heads, even if they are or may evaluate to macros. Right now a real symmetry (between procedures and macros) is lost for the sake of a false one.
This is about removing restrictions in the Clinger-sense, I think - having first-class syntactic transformers. Currently they have a status comparable to CL's functions - without the possibility of anonymous transformers or SYNCALL.
There is also no syntactic equivalent of "direct code" as opposed to code stored in a lambda form. Direct syntactic code would of course be immune to later redifinitions of transformers - as the transformation has taken place at the moment the single evaluation pass dealt with that bit of code.

Strings
Wouldn't a Schemish string be a sequence of character objects (soco), where a character object can be complex (base + modifiers)? Just as s-expressions respect the grammatical complexity of expressions, a soco would respect the grammatical complexity of text. If the sequence is a vector, string indexing and string-set! would be O(1). Yes, socos take more space than classical strings, but the same is true for s-expressions as opposed to flat program code.
For Schemes with only ASCII, the set of character objects would be the ASCII set, and the pointers could be 8-bit, i.e. the strings could be the classical string that we all know and love/hate. There's backwards compatibility.
Those who want size-changing substitutions can use some kind of tree representation of the sequence instead of vectors (lists being an extreme kind of unbalanced trees).
The interesting thing would be that people can choose their sequencing level: if you want to reason about code points, use sequences of code points; if you want to reason on the syllable level, make sequences of syllables. It is possible to have several levels: a sequence of words, each of which is a sequence of characters, each of which is a sequence of code points.
In other words: Scheme should not prescribe one crippling string format, but rather a set of specifiers which which people can define the strings they need. A classical ASCII-like one is obligatory, the others are optional.

No comments: