Friday, 30 December 2011

IBNIZ - a hardcore audiovisual virtual machine and an esoteric programming language

Some days ago, I finished the first public version of my audiovisual virtual machine, IBNIZ. I also showed it off on YouTube with the following video:

As demonstrated by the video, IBNIZ (Ideally Bare Numeric Impression giZmo) is a virtual machine and a programming language that generates video and audio from very short strings of code. Technically, it is a two-stack machine somewhat similar to Forth, but with the major execption that the stack is cyclical and also used at an output buffer. Also, as every IBNIZ program is implicitly inside a loop that pushes a set of loop variables on the stack on every cycle, even an empty program outputs something (i.e. a changing gradient as video and a constant sawtooth wave as audio).

How does it work?

To illustrate how IBNIZ works, here's how the program ^xp is executed, step by step:

So, in short: on every loop cycle, the VM pushes the values T, Y and X. The operation ^ XORs the values Y and X and xp pops off the remaining value (T). Thus, the stack gets filled by color values where the Y coordinate is XORed by the X coordinate, resulting in the ill-famous "XOR texture".

The representation in the figure was somewhat simplified, however. In reality, IBNIZ uses 32-bit fixed-point arithmetic where the values for Y and X fall between -1 and +1. IBNIZ also runs the program in two separate contexts with separate stacks and internal registers: the video context and the audio context. To illustrate this, here's how an empty program is executed in the video context:

The colorspace is YUV, with the integer part of the pixel value interpreted as U and V (roughly corresponding to hue) and the fractional part interpreted as Y (brightness). The empty program runs in the so-called T-mode where all the loop variables -- T, Y and X -- are entered in the same word (16 bits of T in the integer part and 8+8 bits of Y and X in the fractional). In the audio context, the same program executes as follows:

Just like in the T-mode of the video context, the VM pushes one word per loop cycle. However, in this case, there is no Y or X; the whole word represents T. Also, when interpreting the stack contents as audio, the integer part is ignored altogether and the fractional part is taken as an unsigned 16-bit PCM value.

Also, in the audio context, T increments in steps of 0000.0040 while the step is only 0000.0001 in the video context. This is because we need to calculate 256x256 pixel values per frame (nearly 4 million pixels if there are 60 frames per second) but suffice with considerably fewer PCM samples. In the current implementation, we calculate 61440 audio samples per second (60*65536/64) which is then downscaled to 44100 Hz.

The scheduling and main-looping logic is the only somewhat complex thing in IBNIZ. All the rest is very elementary, something that can be found as instructions in the x86 architecture or as words in the core Forth vocabulary. Basic arithmetic and stack-shuffling. Memory load and store. An if/then/else structure, two kinds of loop structures and subroutine definition/calling. Also an instruction for retrieving user input from keyboard or pointing device. Everything needs to be built from these basic building blocks. And yes, it is Turing complete, and no, you are not restricted to the rendering order provided by the implicit main loop.

The full instruction set is described in the documentation. Feel free to check it out experiment with IBNIZ on your own!

So, what's the point?

The IBNIZ project started in 2007 with the codename "EDAM" (Extreme-Density Art Machine). My goal was to participate in the esoteric programming language competition at the same year's Alternative Party, but I didn't finish the VM at time. The project therefore fell to the background. Every now and then, I returned to the project for a short while, maybe revising the instruction set a little bit or experimenting with different colorspaces and loop variable formats. There was no great driving force to insppire me to finish the VM until mid-2011 after some quite succesful experiments with very short audiovisual programs. Once some of my musical experiments spawned a trend that eventually even got a name of its own, "bytebeat", I really had to push myself to finally finishing IBNIZ.

The main goal of IBNIZ, from the very beginning, was to provide a new platform for the demoscene. Something without the usual fallbacks of the real-world platforms when writing extremely small demos. No headers, no program size overhead in video/audio access, extremely high code density, enough processing power and preferrably a machine language that is fun to program with. Something that would have the potential to displace MS-DOS as the primary platform for sub-256-byte demoscene productions.

There are also other considerations. One of them is educational: modern computing platforms tend to be mind-bogglingly complex and highly abstracted and lack the immediacy and tangibility of the old-school home computers. I am somewhat concerned that young people whose mindset would have made them great programmers in the eighties find their mindset totally incompatible with today's mainstream technology and therefore get completely driven away from programming. IBNIZ will hopefully be able to serve as an "oldschool-style platform" in a way that is rewarding enough for today's beginninng programming hobbyists. Also, as the demoscene needs all the new blood it can get, I envision that IBNIZ could serve as a gateway to the demoscene.

I also see that IBNIZ has potential for glitch art and livecoding. By taking a nondeterministic approach to experimentation with IBNIZ, the user may encounter a lot of interesting visual and aural glitch patterns. As for livecoding, I suspect that the compactness of the code as well as the immediate visibility of the changes could make an IBNIZ programming performance quite enjoyable to watch. The live gigs of the chip music scene, for example, might also find use for IBNIZ.

About some design choices and future plans

IBNIZ was originally designed with an esoteric programming language competition in mind, and indeed, the language has already been likened to the classic esoteric language Brainfuck by several critical commentators. I'm not that sure about the similarity with Brainfuck, but it does have strong conceptual similarities with FALSE, the esoteric programming language that inspired Brainfuck. Both IBNIZ and FALSE are based on Forth and use one-character-long instructions, and the perceived awkwardness of both comes from unusual, punctuation-based syntax rather than deliberate attempts at making the language difficult.

When contrasting esotericity with usefulness, it should be noted that many useful, mature and well-liked languages, such as C and Perl, also tend to look like total "line noise" to the uninitiated. Forth, on the other hand, tends to look like mess of random unrelated strings to people unfamiliar with the RPN syntax. I therefore don't see how the esotericity of IBNIZ would hinder its usefulness any more than the usefulness of C, Perl or Forth is hindered by their syntaxes. A more relevant concern would be, for example, the lack of label and variable names in IBNIZ.

There are some design choices that often get questioned, so I'll perhaps explain the rationale for them:

  • The colors: the color format has been chosen so that more sensible and neutral colors are more likely than "coder colors". YUV has been chosen over HSV because there is relatively universal hardware support for YUV buffers (and I also think it is easier to get richer gradients with YUV than with HSV).
  • Trigonometric functions: I pondered for a long while whether to include SIN and ATAN2 and I finally decided to do so. A lot of demoscene tricks depend, including all kinds of rotating and bouncing things as well as more advanced stuff such as raycasting, depends on the availability of trigonometry. Both of these operations can be found in the FPU instruction set of the x86 and are relatively fundamental mathematical stuff, so we're not going into library bloat here.

  • Floating point vs fixed point: I considered floating point for a long while as it would have simplified some advanced tricks. However, IBNIZ code is likely to use a lot of bitwise operations, modular bitwise arithmetic and indefinitely running counters which may end up being problematic with floating-point. Fixed point makes the arithmetic more concrete and also improves the implementability of IBNIZ on low-end platforms that lack FPU.
  • Different coordinate formats: TYX-video uses signed coordinates because most effects look better when the origin is at the center of the screen. The 'U' opcode (userinput), on the other hand, gives the mouse coordinates in unsigned format to ease up pixel-plotting (you can directly use the mouse coordinates as part of the framebuffer memory address). T-video uses unsigned coordinates for making the values linear and also for easier coupling with the unsigned coordinates provided by 'U'.

Right now, all the existing implementations of IBNIZ are rather slow. The C implementation is completely interpretive without any optimization phase prior to execution. However, a faster implementation with some clever static analysis is quite high on the to-do list, and I expect a considerable performance boost once native-code JIT compilers come into use. After all, if we are ever planning to displace MS-DOS as a sizecoding platform, we will need to get IBNIZ to run at least faster than DOSBOX.

The use of externally-provided coordinate and time values will make it possible to scale a considerable portion of IBNIZ programs to a vast range of different resolutions from character-cell framebuffers on 8-bit platforms to today's highest higher-than-high-definition standards. I suspect that a lot of IBNIZ programs can be automatically compiled into shader code or fast C-64 machine language (yes, I've made some preliminary calculations for "Ibniz 64" as well). The currently implemented resolution, 256x256, however, will remain as the default resolution that will ensure compatibility. This resolution, by the way, has been chosen because it is in the same class with 320x200, the most popular resolution of tiny MS-DOS demos.

At some point of time, it will also become necessary to introduce a compact binary representation of IBNIZ code -- with variable bit lengths primarily based on the frequency of each instruction. The byte-per-character representation already has a higher code density than the 16-bit x86 machine language, and I expect that a bit-length-optimized representation will really break some boundaries for low size classes.

An important milestone will be a fast and complete version that runs in a web brower. I expect this to make IBNIZ much more available and accessible than it is now, and I'm also planning to host an IBNIZ programming contest once a sufficient web implementation is on-line. There is already a Javascript implementation but it is rather slow and doesn't support sound, so we will still have to wait for a while. But stay tuned!

Tuesday, 15 November 2011

Materiality and the demoscene: when does a platform feel real?

I've just finished reading Daniel Botz's 428-page PhD dissertation "Kunst, Code und Maschine: Die Ästhetik der Computer-Demoszene".

The book is easily the best literary coverage of the demoscene I've seen so far. It is basically a history of demos as an artform with a particular emphasis on the esthetical aspects of demos, going very deeply into different styles and techniques and their development, often in relation to the features of the three "main" demoscene platforms (C-64, Amiga and PC).

What impressed me the most in the book and gave me most food for thought, however, was the theoretical insight. Botz uses late Friedrich Kittler's conception of media materiality as a theoretical device to explain how the demoscene relates to the hardware platforms it uses, often contrasting the relationship to that of the mainstream media art. In short: the demoscene cares about the materiality of the platforms, while the mainstream art world ignores it.

To elaborate: mainstream computer artists regard computers as tools, universal "anything machines" that can translate pure, immaterial, technology-independent ideas into something that can be seen, heard or otherwise experienced. Thus, ideas come before technology. Demosceners, however, have an opposite point of view; for them, technology comes before ideas. A computer platform is seen as a material that can be brought into different states, in a way comparable to how a sculptor brings blocks of stone into different forms. The possibilities of a material can be explored with direct, uncompromising interaction such as low-level programming. The platform is not neutral, its characteristics are essential to what demos written for it end up being like. While a piece of traditional computer art can often be safely removed from its specific technological context, a demo is no longer a demo if the platform is neglected.

The focus on materiality also results in a somewhat unusual relationship with technology. For most people, computer platforms are just evolutionary stages on a timeline of innovation and obsolescence. A device serves for a couple of years before getting abandoned in favor of a new model that is essentially the same with higher specs. The characteristics of a digital device boil down to numerical statistics in the spirit of "bigger is better". The demoscene, however, sees its platforms as something more multi-faceted. An old computer or gaming console may be interesting as an artistic material just because of its unique combination of features and limitations. It is fine to have historical, personal or even political reasons for choosing a specific platform, but they're not necessary; the features of the system alone are enough to grow someone's creative enthusiasm. As so many people misunderstand the relationship between demoscene and old hardware as a form of "retrocomputing", it is very delightful to see such an accurate insight to it.

But is it really that simple?

I'm not entirely familiar with the semantic extent of "materiality" in media studies, but it is apparent that it primarily refers to physicality and concreteness. In many occasions, Botz contrasts materiality against virtuality, which, I think, is an idea that stems from Gilles Deleuze. This dichotomy is simple and appealing, but I disagree with Botz in how central it is to what the demoscene is doing. After all, there are, for example, quite many 8-bit-oriented demoscene artists who totally approve virtualization. Artists who don't care whether their works are shown with emulators or real hardware at parties, as long as the logical functionality is correct. Some even produce art for the C-64 without having ever owned a material C-64. Therefore, virtualization is definitely not something that is universally frowned upon on the demoscene. It is apparently also possible to develop a low-level, concrete material relationship with an emulated machine, a kind of "material" that is totally virtual to begin with!

Computer programming is always somewhat virtual, even in its most down-to-the-metal incarnations. Bits aren't physical objects; concentrations of electrons only get the role of bits from how they interact with the transistors that form the logical circuits. A low-level programmer who strives for a total, optimal control of a processor doesn't need to be familiar with these material interactions; just knowing the virtual level of bits, registers, opcodes and pipelines is enough. The number of abstraction layers between the actual bit-twiddling and the layer visible to the programmer doesn't change how programming a processor feels like. A software emulator or an FPGA reimplementation of the C-64 can deliver the same "material feeling" to the programmer as the original, NMOS-based C-64. Also, if the virtualization is perfect enough to model the visible and audible artifacts that stem from the non-binary aspects of the original microchips, even a highly experienced enthusiast can be fooled.

I therefore think it is more appropriate to consider the "feel of materiality" that demosceners experience to stem from the abstract characteristics of the platform than its physicality. Programming an Atari VCS emulator running in an X86 PC on top of an operating system may very well feel more concrete than programming the same PC directly with the X86 assembly language. When working with a VCS, even a virtualized one, a programmer needs to be aware of the bit-level machine state at all times. There's no display memory in the VCS; the only way to draw something on the screen is by telling the processor to put specific values in specific video chip registers at specific clock cycles. The PC, however, does have a display memory that holds the pixel values of the on-screen picture, as well as a video chip that automatically refreshes its contents to the screen. A PC programmer can therefore use very generic algorithms to render graphics in the display memory without caring about the underlying hardware, while on the VCS everything needs to be thought out from the specific point of view of the video chip and the CPU.

It seems that the "feel of materiality" has particularly much to do with complexity -- of both the platform and the manipulated data. A high-resolution picture, taking up megabytes of display memory, looks nearly identical on a computer screen regardless of whether it is internally represented in RGB or YUV colorspace. However, when we get a pixel artist to create versions of the same picture for various formats that use less than ten kilobytes of display memory, such as PC textmode or C-64 multicolor, the specific features and constraints of each format shine out very clearly. High levels of complexity allow for generic, platform-independent and general-purpose techniques whereas low levels of complexity require the artist to form a "material relationship" with the format.

Low complexity and the "feel of materiality" are also closely related to the "feel of total control" which I regard as an important state that demosceners tend to reach for. The lower the complexity of a platform, the easier it is to reach a total understanding of its functionality. Quite often, coders working on complex platforms choose to deliberately lower the perceived complexity by concentrating on a reduced, "essential" subset of the programming interface and ignoring the rest. Someone who codes for a modern PC, for example, may want to ignore the polygonal framework of the 3D API altogether and exclusively concentrate on shader code. Those who write softsynths, even for tiny size classes, tend to ignore high-level synthesis frameworks that may be available on the OS and just use a low-level PCM-soundbuffer API. Subsets that provide nice collections of powerful "Lego blocks" are the way to go. Even though bloated system libraries may very well contain useful routines that can be discovered and abused in things like 4-kilobyte demos, most democoders frown upon this idea and may even consider it cheating.

Emulators, virtual platforms and reduced programming interfaces are ways of creating pockets of lowered complexity within highly complex systems -- pockets that feel very "material" and controllable for a crafty programmer. Even virtual platforms that are highly abstract, idealistic and mathematical may feel "material". The "oneliner music platform", merely defined as C-like expression syntax that calculates PCM sample values, is a recent example of this. All of its elements are defined on a relatively high level, no specification of any kind of low-level machine, virtual or otherwise. Nevertheless, a kind of "material characteristic" or "immanent esthetics" still emerges from this "platform", both in how the sort formulas tend to sound like and what kind of hacks and optimizations are better than others.

The "oneliner music platform" is perhaps an extreme example, but in general, purely virtual platforms have been there for a while already. Things like Java demos, as well as multi-platform portable demos, have been around since the late 1990s, although they've usually remained quite marginal. For some reason, however, Botz seems to ignore this aspect of the demoscene nearly completely, merely stating that multi-platform demos have started to appear "in recent years" and that the phenomenon may grow bigger in the future. Perhaps this is a deliberate bias chosen to avoid topics that don't fit well within Botz's framework. Or maybe it's just an accident. I don't know.


To summarize: when Botz talks about the materiality of demoscene platforms, he often refers to phenomena that, in my opinion, could be more fruitfully analyzed with different conceptual devices, especially complexity. Wherever the dichotomy of materiality and immateriality comes up, I see at least three separate conceptual dimensions working under the hood:

1. Art vs craft (or "idea-first" vs "material-first"). This is the area where Botz's theory works very well: demoscene is, indeed, more crafty or "material-first" than most other communities of computer art. However, the material (i.e. the demo platform) doesn't need to be material (i.e. physical); the crafty approach works equally well with emulated and purely virtual platforms. The "artsy" approach, leading to conceptual and "avant-garde" demos, has gradually become more and more accepted, however there's still a lot of crafty attitude in "art demos" as well. I consider chip musicians, circuit-benders and homebrew 8-bit developers about as crafty on average as demosceners, by the way.

2. Physicality vs virtuality. There's a strong presence of classic hardware enthusiasm on the demoscene as well as people who build their own hardware, and they definitely are in the right place. However, I don't think the physical hardware aspect is as important in the demoscene as, for example in the chip music, retrogaming and circuit-bending communities. On the demoscene, it is more important to demonstrate the ability to do impressive things in limited environments than to be an owner of specific physical gear or to know how to solder. A C-64 demo can be good even if it is produced with an emulator and a cross-compiler. Also, as demo platforms can be very abstract and purely virtual as well and still be appealing to the subculture, I don't think there's any profound dogma that would drive demosceners towards physicality.

3. Complexity. The possibility of forming a "material relationship" with an emulated platform shows that the perception of "materiality", "physicality" and "controllability" is more related to the characteristics of the logical platform than to how many abstraction layers there are under the implementation. A low computational complexity, either in the form of platform complexity or program size, seems to correlate with a "feeling of concreteness" as well as the prominence of "emergent platform-specific esthetics". What I see as the core methodology of the demoscene seems to work better at low than high levels of complexity and this is why "pockets of lowered complexity" are often preferred by sceners.

Don't take me wrong: despite all the disagreements and my somewhat Platonist attitude to abstract ideas in general, I still think virtuality and immateriality have been getting too much emphasis in today's world and we need some kind of a countercultural force that defends the material. Botz also covers possible countercultural aspects of the demoscene, deriving them from the older hacker culture, and I found all of them very relevant. My basic disagreement comes from the fact that Botz's theory doesn't entirely match with how I perceive the demoscene to operate, and the subculture as a whole cannot therefore be put under a generalizing label such as "defenders and lovers of the materiality of the computer".

Anyway, I really enjoyed reading Botz's book and especially appreciated the theoretical insight. I recommend the book to everyone who is interested in the demoscene, its history and esthetic variety, AND who reads German well. I studied the language for about five years at school but I still found the text quite difficult to decipher at places. I therefore sincerely hope that my problems with the language haven't led me to any critical misunderstandings.

Friday, 28 October 2011

Some deep analysis of one-line music programs.

It is now a month since I posted the YouTube video "Experimental music from very short C programs" and three weeks since I blogged about it. Now that the initial craze seems to be over, it's a good time to look back what has been done and consider what could be done in the future.

The developments since my last post can be summarized by my third video. It still represents the current state of the art quite well and includes a good variety of different types of formulas.

The videos only show off a portion of all the formulas that could be included. To compensate, I've created a text file where I've collected all the "worthy" formulas I've encountered so far. Most of them can be tested in the on-line JavaScript and ActionScript test tools. Some of them don't even work directly in C code, as they depend on JS/AS-specific features.

As I'm sure that many people still find these formulas rather magical and mysterious, I've decided to give you a detailed technical analysis and explanation on the essential techniques. As I'm completely self-educated in music theory, please pardon my notation and terminology that may be unorthodox at times. You should also have a grasp of C-like expression syntax and binary arithmetic to understand most of the things I'm going to talk about.

I've sorted my formula collection by length. By comparing the shortest and longest formulas, it is apparent that the longest formulas show a much more constructivist approach, including musical data stored in constants as well as entire piece-by-piece-constructed softsynths. The shortest formulas, on the other hand, are very often discovered via non-deterministic testing, from educated guesses to pure trial-and-error. One of my aims with this essay is to bring some understanding and determinism to the short side as well.

Pitches and scales

A class of formulas that is quite prominent among the shortest ones is what I call the 't* class'. The formulas of this type multiply the time counter t with some expression, resulting in a sawtooth wave that changes its pitch according to that expression.

A simple example of a t*-class formula would be t*(t>>10) which outputs a rising and falling sound (accompanied by some aliasing artifacts that create their own sounds). Now, if we introduce an AND operator to this formula, we can restrict the set of pitches and thus create melodies. An example that has been individually discovered by several people, is the so-called "Forty-Two Melody": t*(42&t>>10) or t*2*(21&t>>11).

The numbers that indicate pitches are not semitones or anything like that, but multiplies of a base frequency (sampling rate divided by 256, i.e. 31.25 Hz at the default 8 kHz rate). Here is a table that maps the integer pitches 1..31 to cents and Western note names. The pitches on a gray background don't have good counterparts in the traditional Western system, so I've used quarter-tone flat and sharp symbols to give them approximate names.

By using this table, we can decode the Forty-Two Melody into a human-readable form. The melody is 32 steps long and consists of eight unique pitch multipliers (including zero which gives out silence).

The "Forty-Two Melody" contains some intervals that make it sound a little bit silly, detuned or "Arabic" to Western ears. If we want to avoid this effect, we need to design our formulas so that they only yield pitches that are at familiar intervals from one another. A simple solution is to include a modulo operator to wrap larger numbers to the range where simple integer ratios are more probable. Modifying the Forty-Two Melody into t*((42&t>>10)%14), for example, completely transforms the latter half of the melody into something that sounds a little bit nicer to Western ears. Bitwise AND is also useful for limiting the pitch set to a specific scale; for example t*(5+((t>>11)&5)) produces pitch multipliers of 4, 5, 8 and 9, which correspond to E3, G3, C4 and D4.

Ryg's 44.1 kHz formula presented in the third video contains two different melody generators:


The first generator, in the first half of the formula, is based on a string constant that contains a straight-forward list of pitches. This list is used for the bass pattern. The other generator, whose core is the subexpression ((t>>12)^(t>>12)-2)%11, is more interesting, as it generates a rather deep self-similar melody structure with just three operators (subtraction, exclusive or, modulo). Rather impressive despite its profound repetitiveness. Here's an analysis of the series it generates:

It is often a good idea to post-process the waveform output of a plain t* formula. The sawtooth wave tends to produce a lot of aliasing artifacts, particularly at low sampling rates. Attaching a '&128' or '&64' in the end of a t* formula switches the output to square wave which usually sounds a little bit cleaner. An example of this would be Niklas Roy's t*(t>>9|t>>13)&16 which sounds a lot noisier without the AND (although most of the noise in this case comes from the unbounded multiplication arithmetic, not from aliasing).

Bitwise waveforms and harmonies

Another class of formulas that is very prominent among the short ones is the bitwise formula. At its purest, such a formula only uses bitwise operations (shifts, negation, AND, OR, XOR) combined with constants and t. A simple example is t&t>>8 -- the "Sierpinski Harmony". Sierpinski triangles appear very often in plotted visualizations of bitwise waveforms, and t&t>>8 represents the simplest type of formula that renders into a nice Sierpinski triangle.

Bitwise formulas often sound surprisingly multitonal for their length. This is based on the fact that an 8-bit sawtooth wave can be thought of consisting of eight square waves, each an octave apart from its neighbor. Usually, these components fuse together in the human brain, forming the harmonics of a single timbre, but if we turn them on and off a couple of times per second or slower, the brain might perceive them as separate tones. For example, t&48 sounds quite monotonal, but in t&48&t>>8, the exactly same waveform sounds bitonal because it abruptly extends the harmonic content of the previous waveform.

The loudest of the eight square-wave components of an 8-bit wave is, naturally, the one represented by the most significant bit (&128). In the sawtooth wave, it is also the longest in wavelength. The second highest bit (&64) represents a square wave that has half the wavelength and amplitude, the third highest halves the parameters once more, and so on. By using this principle, we can analyze the musical structure of the Sierpinski Harmony:

The introduction of ever lower square-wave components can be easily heard. One can also hear quite well that every newly introduced component is considerably lower in pitch than the previous one. However, if we include a prime multiplier in the Sierpinski Harmony, we will encounter an anomaly. In (t*3)&t>>8, the loudest tone actually goes higher at a specific point (and the interval isn't an octave either).

This phenomenon can be explained with aliasing artifacts and how they are processed by the brain. The main wavelength in t*3 is not constant but alternates between two values, 42 and 43, averaging to 42.67 (256/3). The human mind interprets this kind of sound as a waveform of the average length (42.67 samples) accompanied by an extra sound that represents the "error" (or the difference from the ideal wave). In the t*3 example, this extra sound has a period of 256 samples and sounds like a buzzer when listened separately.

The smaller the wavelengths we are dealing with are, the more prominent these aliasing artifacts become, eventually dominating over their parent waveforms. By listening to (t*3)&128, (t*3)&64 and (t*3)&32, we notice an interval of an octave between them. However, when we step over from (t*3)&32 to (t*3)&16, the interval is definitely not an octave. This is the threshold where the artifact wave becomes dominant. This is why t&t>>8, (t*3)&t>>8 and (t*5)&t>>8 sound so different. It is also the reason why high-pitched melodies may sound very detuned.

Variants of the Sierpinski harmony can be combined to produce melodies. Examples of this approach include:

t*5&(t>>7)|t*3&(t*4>>10) (from miiro)

(t*5&t>>7)|(t*3&t>>10) (from viznut)

t*9&t>>4|t*5&t>>7|t*3&t/1024 (from stephth)

Different counters are the driving force of bitwise formulas. At their simplest, counters are just bitshifted versions of the main counter (t). These are implicitly synchronized with each other and work on different temporal levels of the musical piece. However, it has also been fruitful to experiment with counters that don't have a simple common denominator, and even with ones whose speeds are nearly identical. For example, t&t%255 brings a 256-cycle counter and a 255-cycle counter together with an AND operation, resulting in an ambient drone sound that sounds like something achievable with pulse-width modulation. This approach seems to be more useful for loosely structured soundscapes than clear-cut rhythms or melodies.

Some oneliner songs attach a bitwise operation to a melody generator for transposing the output by whole octaves. A simple example is Rrrola's t*(0xCA98>>(t>>9&14)&15)|t>>8 which would just loop a simple series of notes without the trailing '|t>>8'. This part gradually fixes the upper bits of the output to 1s, effectively raising the pitch of the melody and fading its volume out. Also the formulas from Ryg and Kb in my third video use this technique. The most advanced use of it I've seen so far, however, is in Mu6k's song (the last one in the 3rd video) which synthesizes its lead melody (along with some accompanying beeps) by taking the bassline and selectively turning its bits on and off. This takes place within the subexpression (t>>8^t>>10|t>>14|x)&63 where the waveform of the bass is input as x.

Modular wrap-arounds and other synthesis techniques

All the examples presented so far only use counters and bitwise operations to synthesize the actual waveforms. It's therefore necessary to talk a little bit about other operations and their potential as well.

By accompanying a bitwise formula with a simple addition or substraction, it is possible to create modular wrap-around artifacts that produce totally different sounds. Tiny, nearly inaudible sounds may become very dominant. Harmonious sounds often become noisy and percussive. By extending the short Sierpinski harmony t&t>>4 into (t&t>>4)-5, something that sounds like an "8-bit" drum appears on top of it. The same principle can also be applied to more complex Sierpinski harmony derivatives as well as other bitwise formulas:


I'm not going into a deep analysis of how modular wrap-arounds affect the harmonic structure of a sound, as I guess someone has already done the math before. However, modular addition can be used for something that sounds like oscillator hard-sync in analog synthesizers, although its technical basis is different.

Perhaps the most obvious use for summing in a softsynth, however, is the one where modular wrap-around is not very useful: mixing of several sound sources together. A straight-forward recipe for this is (A&127)+(B&127), which may be a little long-winded when aiming at minimalism. Often, just a simple XOR operation is enough to replace it, although it usually produces artifacts that may sound good or bad depending on the case. XOR can also be used for effects that sound like hard-sync.

Of course, modular wrap-around effects are also achievable with multiplication and division, and on the other hand, even without addition or subtraction. I'll illustrate this with just a couple of interesting-sounding examples:

t>>4|t&((t>>5)/(t>>7-(t>>15)&-t>>7-(t>>15))) (from droid, js/as only)

(int)(t/1e7*t*t+t)%127|t>>4|t>>5|t%127+(t>>16)|t (from bst)

t>>6&1?t>>5:-t>>4 (from droid)

There's a lot in these and other synthesis algorithms that could be discussed, but as they already belong to a zone where traditional sound synthesis lore applies, I choose to go on.

Deterministic composition

When looking at the longest formulas in the collection, it is apparent that there's a lot of intelligent design behind most of them. Long constants and tables, sometimes several of them, containing scales, melodies, basslines and drum patterns. The longest formula in the collection is "Long Line Theory", a cover of the soundtrack of the 64K demo "Chaos Theory" by Conspiracy. The original version by mu6k was over 600 characters long, from which the people on optimized it down to 300 characters, with some arguable quality tradeoffs.

It is, of course, possible to synthesize just about anything with a formula, especially if there's no upper limit for the length. Synthesis and sequencing logic can be built section by section, using rather generic algorithms and proven engineering techniques. There's no magic in it. But on the other hand, there's no magic in pure non-determinism either: it is very difficult to find anything outstanding with totally random experimentation after the initial discovery phase is over.

Many of the more sophisticated formulas seem to have a good balance between random experimentation and deterministic composition. It is often apparent in their structure that some elements are results of random discoveries while others have been built with an engineer's mindset. Let's look at Mu6k's song (presented in the end of the 3rd video, 32 kHz):

(((int)(3e3/(y=t&16383))&1)*35) +
(x=t*("6689"[t>>16&3]&15)/24&127)*y/4e4 +

I've split the formula on three lines according to the three instruments therein: drum, bass and lead.

My assumption is that the song has been built around the lead formula that was discovered first, probably in the form of t>>6^t>>8|t>>12|t&63 or something (the original version of this formula ran at 8 kHz). As usual with pure bitwise formulas, all the intervals are octaves, but in this case, the musical structure is very nice.

As it is possible to transpose a bit-masking melody simply by transposing the carrier wave, it's a good idea to generate a bassline and reuse it as the carrier. Unlike the lead generator, the bassline generator is very straight-forward in appearance, consisting of four pitch values stored in a string constant. A sawtooth wave is generated, stored to a variable (so that it can be reused by the lead melody generator) and amplitude-modulated.

Finally, there's a simple drum beat that is generated by a combination of division and bit extraction. The extracted bit is scaled to the amplitude of 35. Simple drums are often synthesized by using fast downward pitch-slides and the division approach does this very well.

In the case of Ryg's formula I discussed some sections earlier, I might also guess that the melody generator, the most chaotic element of the system, was the central piece which was later coupled with a bassline generator whose pitches were deliberately chosen to harmonize with the generated melody.

The future

I have been contacted by quite many people who have brought up different ideas of future development. We should, for example, have a social website where anyone could enter new formulas, listen to the in a playlist-like manner and rate them. Another branch of ideas is about the production of new rateable formulas by random generation or by breeding old ones together with genetic algorithms.

All of these ideas are definitely interesting, but I don't think the time is yet right for them. I have been developing my audiovisual virtual machine, which is the main reason why I did these experiments in the first place. I regard the current concept of "oneliner music" as a mere placeholder for the system that is yet to be released. There are too many problems with the C-like infix syntax and other aspects of the concept, so I think it's wiser to first develop a better toy and then think about a community mechanism. However, these are just my own priorities. If someone feels like building the kind of on-line community I described, I'll support the idea.

I've mentioned this toy before. It was previously called EDAM, but now I've chosen to name it IBNIZ (Ideally Bare Numeric Impression giZmo). One of the I letters could also stand for "immediate" or "interactive", as I'm going to emphasize an immediate, hands-on modifiability of the code. IBNIZ will hopefully be relevant as a demoscene platform for extreme size classes, as a test bed for esoteric algorithmic trickery, as an appealing introduction to hard-core minimalist programming, and also as a fun toy to just jam around with. Here's a little screenshot of the current state:

In my previous post, I mentioned the possibility of opening a door for 256-byte demos that are interesting both graphically and musically. The oneliner music project and IBNIZ will provide valuable research for the high-level, algorithmic aspects of this project, but I've also made some
hands-on tests on the platform-level feasability of the idea. It is now apparent that a stand-alone MS-DOS program that generates PCM sound and synchronized real-time graphics can easily fit in less then 96 bytes, so there's a lot of room left for both music and graphics in the 256-byte size
class. I'll probably release a 128- or 256-byte demo as a proof-of-concept, utilizing something derived from a nice oneliner music formula as the soundtrack.

I would like to thank everyone who has been interested in the oneliner music project, as all the hype made me very determined to continue my quests for unleashing the potential of the bit and the byte. My next post regarding this quest will probably appear once there's a version of IBNIZ worth releasing to the public.

Sunday, 2 October 2011

Algorithmic symphonies from one line of code -- how and why?

Lately, there has been a lot of experimentation with very short programs that synthesize something that sounds like music. I now want to share some information and thoughts about these experiments.

First, some background. On 2011-09-26, I released the following video on Youtube, presenting seven programs and their musical output:

This video gathered a lot of interest, inspiring many programmers to experiment on their own and share their findings. This was further boosted by Bemmu's on-line Javascript utility that made it easy for anyone (even non-programmers, I guess) to jump in the bandwagon. In just a couple of days, people had found so many new formulas that I just had to release another video to show them off.

Edit 2011-10-10: note that there's now a third video as well!

It all started a couple of months ago, when I encountered a 23-byte C-64 demo, Wallflower by 4mat of Ate Bit, that was like nothing I had ever seen on that size class on any platform. Glitchy, yes, but it had a musical structure that vastly outgrew its size. I started to experiment on my own and came up with a 16-byte VIC-20 program whose musical output totally blew my mind. My earlier blog post, "The 16-byte frontier", reports these findings and speculates why they work.

Some time later, I resumed the experimentation with a slightly more scientific mindset. In order to better understand what was going on, I needed a simpler and "purer" environment. Something that lacked the arbitrary quirks and hidden complexities of 8-bit soundchips and processors. I chose to experiment with short C programs that dump raw PCM audio data. I had written tiny "/dev/dsp softsynths" before, and I had even had one in my email/usenet signature in the late 1990s. However, the programs I would now be experimenting with would be shorter and less planned than my previous ones.

I chose to replicate the essentials of my earlier 8-bit experiments: a wave generator whose pitch is controlled by a function consisting of shifts and logical operators. The simplest waveform for /dev/dsp programs is sawtooth. A simple for(;;)putchar(t++) generates a sawtooth wave with a cycle length of 256 bytes, resulting in a frequency of 31.25 Hz when using the the default sample rate of 8000 Hz. The pitch can be changed with multiplication. t++*2 is an octave higher, t++*3 goes up by 7 semitones from there, t++*(t>>8) produces a rising sound. After a couple of trials, I came up with something that I wanted to share on an IRC channel:


In just over an hour, Visy and Tejeez had contributed six more programs on the channel, mostly varying the constants and changing some parts of the function. On the following day, Visy shared our discoveries on Google+. I reshared them. A surprising flood of interested comments came up. Some people wanted to hear an MP3 rendering, so I produced one. All these reactions eventually led me to release the MP3 rendering on Youtube with some accompanying text screens. (In case you are wondering, I generated the screens with an old piece of code that simulates a non-existing text mode device, so it's just as "fakebit" as the sounds are).

When the first video was released, I was still unsure whether it would be possible for one line of C code to reach the sophistication of the earlier 8-bit experiments. Simultaneities, percussions, where are they? It would also have been great to find nice basslines and progressions as well, as those would be useful for tiny demoscene productions.

At some point of time, some people noticed that by getting rid of the t* part altogether and just applying logical operators on shifted time values one could get percussion patterns as well as some harmonies. Even a formula as simple as t&t>>8, an aural corollary of "munching squares", has interesting harmonic properties. Some small features can be made loud by adding a constant to the output. A simple logical operator is enough for combining two good-sounding formulas together (often with interesting artifacts that add to the richness of the sound). All this provided material for the "second iteration" video.

If the experimentation continues at this pace, it won't take many weeks until we have found the grail: a very short program, maybe even shorter than a Spotify link, that synthesizes all the elements commonly associated with a pop song: rhythm, melody, bassline, harmonic progression, macrostructure. Perhaps even something that sounds a little bit like vocals? We'll see.

Hasn't this been done before?

We've had the technology for all this for decades. People have been building musical circuits that operate on digital logic, creating short pieces of software that output music, experimenting with chaotic audiovisual programs and trying out various algorithms for musical composition. Mathematical theory of music has a history of over two millennia. Based on this, I find it quite mind-boggling that I have never before encountered anything similar to our discoveries despite my very long interest in computing and algorithmic sound synthesis. I've made some Google Scholar searches for related papers but haven't find anything. Still, I'm quite sure that at many individuals have come up with these formulas before, but, for some reason, their discoveries remained in obscurity.

Maybe it's just about technological mismatch: to builders of digital musical circuits, things like LFSRs may have been more appealing than very wide sequential counters. In the early days of the microcomputer, there was already enough RAM available to hold some musical structure, so there was never a real urge to simulate it with simple logic. Or maybe it's about the problems of an avant-garde mindset: if you're someone who likes to experiment with random circuit configurations or strange bit-shifting formulas, you're likely someone who has learned to appreciate the glitch esthetics and never really wants to go far beyond that.

Demoscene is in a special position here, as technological mismatch is irrelevant there. In the era of gigabytes and terabytes, demoscene coders are exploring the potential of ever shorter program sizes. And despite this, the sense of esthetics is more traditional than with circuit-benders and avant-garde artists. The hack value of a tiny softsynth depends on how much its output resembles "real, big music" such as Italo disco.

The softsynths used in the 4-kilobyte size class are still quite engineered. They often use tight code to simulate the construction of an analog synthesizer controlled by a stored sequence of musical events. However, as 256 bytes is becoming the new 4K, there has been ever more need to play decent music in the 256-byte size class. It is still possible to follow the constructivist approach in this size class -- for example, I've coded some simple 128-byte players for the VIC-20 when I had very little memory left. However, since the recent findings suggest that an approach with a lot of random experimentation may give better results than deterministic hacking, people have been competing in finding more and more impressive musical formulas. Perhaps all this was something that just had to come out of the demoscene and nowhere else.

Something I particularly like in this "movement" is its immediate, hands-on collaborative nature, with people sharing the source code of their findings and basing their own experimentation on other people's efforts. Anyone can participate in it and discover new, mind-boggling stuff, even with very little programming expertise. I don't know how long this exploration phase is going to last, but things like this might be useful for a "Pan-Hacker movement" that advocates hands-on hard-core hacking to greater masses. I definitely want to see more projects like this.

How profound is this?

Apart from some deterministic efforts that quickly bloat the code up to hundreds of source-code characters, the exploration process so far has been mostly trial-and-error. Some trial-and-error experimenters, however, seem to have been gradually developing an intuitive sense of what kind of formulas can serve as ingredients for something greater. Perhaps, at some time in the future, someone will release some enlightening mathematical and music-theoretical analysis that will explain why and how our algorithms work.

It already seems apparent, however, that stuff like this stuff works in contexts far beyond PCM audio. The earlier 8-bit experiments, such as the C-64 Wallflower, quite blindly write values to sound and video chip registers and still manage to produce interesting output. Media artist Kyle McDonald has rendered the first bunch of sounds into monochrome bitmaps that show an interesting, "glitchy" structure. Usually, music looks quite bad when rendered as bitmaps -- and this applies even to small chiptunes that sound a lot like our experiments, so it was interesting to notice the visual potential as well.

I envision that, in the context of generative audiovisual works, simple bitwise formulas could generate source data not only for the musical output but also drive various visual parameters as a function of time. This would make it possible, for example, for a 256-byte demoscene production to have an interesting and varying audiovisual structure with a strong, inherent synchronization between the effects and the music. As the formulas we've been experimenting with can produce both microstructure and macrostructure, we might assume that they can be used to drive low-level and high-level parameters equally well. From wave amplitudes and pixel colors to layer selection, camera paths, and 3D scene construction. But so far, this is mere speculation, until someone extends the experimentation to these parameters.

I can't really tell if there's anything very profound in this stuff -- after all, we already have fractals and chaos theory. But at least it's great for the kind of art I'm involved with, and that's what matters to me. I'll probably be exploring and embracing the audiovisual potential for some time, and you can expect me to blog about it as well.

Edit 2011-10-29: There's now a more detailed analysis available of some formulas and techniques.

Wednesday, 7 September 2011

A new propaganda tool: Post-Apocalyptic Hacker World

I visited the Assembly demo party this year, after two years of break. It seemed more relevant than in a while, because I had an agenda.

For a year or so, I have been actively thinking about the harmful aspects of people's relationships with technology. It is already quite apparent to me that we are increasingly under the control of our own tools, letting them make us stupid and dependent. Unless, of course, we promote a different world, a different way of thinking, that allows us to remain in control.

So far, I've written a couple of blog posts about this. I've been nourishing myself with the thoughts of prominent people such as Jaron Lanier and Douglas Rushkoff who share the concern. I've been trying to find ways of promoting the aspects of hacker culture I represent. Now I felt that the time was right for a new branch -- an artistic one based on a fictional

My demo "Human Resistance", that came 2nd in the oldskool demo competition, was my first excursion into this new branch. Of course, it has some echoes of my earlier productions such as "Robotic Liberation", but the setting is new. Instead of showing ruthless machines genociding the helpless mankind, we are dealing with a culture of ingenious hackers who manage to outthink a superhuman intellect that dominates the planet.

"Human Resistance" was a relatively quick hack. I was too hurried to fix the problems in the speech compressor or to explore the real potential of Tau Ceti -style pseudo-3D rendering. The text, however, came from my heart, and the overall atmosphere was quite close to what I intended. It introduces a new fictional world of mine, a world I've temporarily dubbed "Post-Apocalyptic Hacker World" (PAHW). I've been planning to use this world not only in demo productions but also in at least one video game. I haven't released anything interactive for like fifteen years, so perhaps it's about time for a game release.

Let me elaborate the setting of this world a little bit.

Fast-forward to a post-singularitarian era. Machines control all the resources of the planet. Most human beings, seduced by the endless pleasures of procedurally-generated virtual worlds, have voluntarily uploaded their minds into so-called "brain clusters" where they have lost their humanity and individuality, becoming mere components of a global superhuman intellect. Only those people with a lot of willpower and a strong philosophical stance against dehumanization remained in their human bodies.

Once the machines initiated an operation called "World Optimization", they started to regard natural formations (including all biological life) as harmful and unpredictable externalities. As a result, planet Earth has been transformed into something far more rigid, orderly and geometric. Forests, mountains, oceans or clouds no longer exist. Strange, lathe-like artifacts protrude from vast, featureless plains. Those who had studied ancient pop culture immediately noticed a resemblance to some of the 3D computer graphics of the 1980s. The real world has now started to look like the computed reality of Tron or the futuristic terrains of video games such as Driller, Tau Ceti and Quake Minus One.

Only a tiny fraction of biological human beings survived World Optimization. These people, who collectively call themselves "hackers", managed to find and exploit the blind spots of algorithmic logic, making it possible for them to establish secret, self-relying underground fortresses where human life can still struggle on. It has become a necessity for all human beings to dedicate as much of their mental capacities as possible to outthinking the brain clusters in order to eventually conquer them.

Many of the tropes in Post-Apocalyptic Hacker World are quite familiar. A human resistance movement fighting against a machine-controlled world, haven't we seen this quite many times already? Yes, we have, but I also think my approach is novel enough to form a basis for some cutting-edge social, technological and political commentary. By emphasizing things like the role of total cognitive freedom and radical understanding of things' inner workings in the futuristic hacker culture, it may be possible to get people realize their importance in the real world as well. It is also quite possible to include elements from real-life hacker cultures and mindsets in the world, effectively adding to their interestingness.

The "PAHW game" (still without a better title) is already in an advanced stage of pre-planning. It is going to become a hybrid CRPG/strategy game with random-generated worlds, very loose scripting and some very unique game-mechanical elements. This is just a side project so it may take a while before I have anything substantial to show, but I'll surely let you know once I have. Stay tuned!

Sunday, 24 July 2011

Don't submit yourself to a game machine!

(This is a translation of a post in my Finnish blog)

Some generations ago, when people said they were playing a game, they usually meant a social leisure activity that followed a commonly decided set of rules. The devices used for gaming were very simple, and the games themselves were purely in the minds of the players. It was possible to play thousands of different games with a single constant deck of cards, and it was possible for anyone to invent new games and variants.

Technological progress brought us "intelligent" gaming devices that reduced the possibility of negotiation. It is not possible to suggest an interesting rule variant to a pinball machine or a one-handed bandit; the machine only implements the rules it is built for. Changing the game requires technical skill and a lot of time, something most people don't have. As a matter of fact, most people aren't even interested in the exact rules of the game, they just care about the fun.

Nowadays, people have submitted ever bigger portions of their lives to "gaming machines" that make things at least superficially easier and simpler, but whose internal rules they don't necessarily understand at all. A substantial portion of today's social interaction in developed countries, for example, takes place in on-line social networking services. Under their hoods, these services calculate things like message visibility -- that is, which messages and whose messages are supposed to be more important for a given user. For most people, however, it seems to be completely OK that a computer owned by a big, distant corporation makes such decisions for them using a secret set of rules. They just care about the fun.

It has always been easy to use the latest media to manipulate people, as it takes time from the audience to develop criticism. When writing was a new thing, most people would regard any text as a "word of God" that was true just because it was written. In comparison, today's people have a thick wall of criticism against any kind of non-interactive propaganda, be that textual, aural or visual, but whenever a game-like interaction is introduced, we often become completely vulnerable. In short, we know how to be critical about an on-line news items but not how to be critical about the "like" and "share" buttons under them.

Video games, in many ways, surpasses traditional passive media in the potential of mental manipulation. A well-known example is the so-called Tetris effect caused by a prolonged playing of a pattern-matching game. The game of Tetris "programs" its player to constantly analyze the on-screen wall of blocks and mentally fit different types of tetrominos in it. When a player stops playing after several hours, the "program" may remain active, causing the player to continue mentally fitting tetrominos on outdoor landscapes or whatever they see in their environment. Other kinds of games may have other kinds of effects. I have personally also experienced an "adventure game effect" that caused me to unwillingly think about real-world things and locations from the point of view of "progressing in the script". Therefore, I don't think it is a very far-fetched idea that spending a lot of time on an interactive website gives our brains a permission to adapt to the "game mechanics" and unnoticeably alter the way how we look at the world.

So, is this a real threat? Are they already trying to manipulate our minds in game-mechanical means, and how? There has been perhaps even too much criticism of Facebook compared to other social networking sites, but I'm now it as an example as it is currently the most familiar one for the wide audience.

As many people probably understand already, Facebook's customer base doesn't consist of the users (who pay nothing for the service) but of marketeers who want their products to be sold. The users can be thought as mere raw material that can be refined to better fit the requirements of the market. This is most visible in the user profile mechanic that encourages users to define themselves primarily with multiple choices and product fandom. The only space in the profile that allows for a longer free text has been laid below all the "more important things". Marketeers don't want personal profile pages but realiable statistics, high-quality consumption habit databases and easily controllable consumers.

The most prominent game-mechanical element in Facebook is "Like", which affects nearly everything on the site. It is a simple and easily processable signal whose use is particularly encouraged. In its internal game, Facebook scores users according to how active "likers" they are, and gives more visibility to the messages of those users that score higher. Moderate users of Facebook, who use their whole brain to consider what to "Like" or not or what to share and not, gain less points and less visibility. This is how Facebook rewards the "virtuous" users and punishes the "sinful" ones.

What about those users who actually want to understand the inner workings of the service, in order to use it better for their own purposes? Facebook makes this very difficult, and I believe it is on purpose. The actual rules of the game haven't been documented anywhere, so users need to follow intuitive guesses or experiment with the thing. If a user actually manages to reverse-engineer part of the black box, he or she can never trust that it continues to work in the same way. The changes in the rules of the internal game can be totally unpredictable. This discourages users from even trying to understand the game they are playing and encourages them to trust the control of their private lives to the computers of a big, distant company.

Of course, Facebook is not representative of all forms of on-line sociality. The so-called imageboards, for example, are diagonally opposite to Facebook in many areas: totally uncommercial and simple-to-understand sites where real names or even pseudonyms are rarey used. As these sites function totally differently from Facebook, it can be guessed that they also affect their users' brains in a different way.

Technically, imageboards resemble discussion boards, but with the game-mechanical difference that they encourage a faster, more spontaneous communication which usually feels more like a loud attention-whoring contest than actual discussion. A lot of the imageboard culture can be explained as mere consequences of the mechanics. The fact that images are often more prominent than text in threads makes it possible for users to superficially skim around the pictures and only focus on the parts that seize their attention. This contributes to the fast tempo that invites the users to react very quickly and spontaneously, usually without any means of identification, as if as part of a rebellious mob. The belief in radical anonymity and hivemind power have ultimately become some kind of core values of the imageboard culture.

The possibility of anonymous commentary gives us a much greater sense of freedom than we get by using our real name or even a long-term pseudonym. Anonymous provocateurs don't need to be afraid of losing their face. They feel free to troll around from the bottom of their heart, looking for moments of "lulz" they get by heating someone up. The behavior is probably familiar to anyone who has been reading anonymous comments on news websites or toilet walls. Imageboards just take this kind of behavior to its logical extreme, basing all of its social interaction on a spontaneous mob behavior.

Critics of on-line culture, such as Lanier and Rushkoff, have often expressed their concern of how on-line socialization trivializes our view of other people. Instead of interacting with living people with rich personalities, we seem to be increasingly dealing with lists, statistics and faceless mobs who we interact with using "Like", "Block" and "Add Friend" buttons. I'm also concerned about this. Even when someone rationally understands on the rational level that this is just an abstraction required by the means of communication to work, we may accidentally and unnoticeably become programmed by the "Tetris effects" of these media. Awareness and criticism may very well reduce the risk, but I don't believe they can make anyone totally immune.

So, what can we do? Should we abandon social networking sites altogether to save the humanity of the human race? I don't think denialism helps anything. Instead, we should learn how to use the potential of interactive social technology in constructive rather than destructive means. We should develop new game mechanics that, instead of promoting collective stupidity and dehumanization, augment the positive sides of humanity and encourage us to improve ourselves. But is this anything great masses could become interested in? Do they any longer care about whether they remain as independent individuals? Perhaps not, but we can still hope for the best.

Tuesday, 21 June 2011

The 16-byte frontier: extreme results from extremely small programs.

While mainstream software has been getting bigger and more bloated year after year, the algorithmic artists of the demoscene have been following the opposite route: building ever smaller programs to generate ever more impressive audiovisual show-offs.

The traditional competition categories for size-limited demos are 4K and 64K, limiting the size of the stand-alone executable to 4096 and 65536 bytes, respectively. However, as development techniques have gone forward, the 4K size class has adopted many features of the 64K class, or as someone summarized it a couple of years ago, "4K is the new 64K". There are development tools and frameworks specifically designed for 4K demos. Low-level byte-squeezing and specialized algorithmic beauty have given way to high-level frameworks and general-purpose routines. This has moved a lot of "sizecoding" activity into more extreme categories: 256B has become the new 4K. For a fine example of a modern 256-byter, see Puls by Rrrrola.

The next hexadecimal order of magnitude down from 256 bytes is 16 bytes. Yes, there are some 16-byte demos, but this size class has not yet established its status on the scene. At the time of writing this, the smallest size category in the database is 32B. What's the deal? Is the 16-byte limit too tight for anything interesting? What prevents 16B from becoming the new 256B?

Perhaps the most important platform for "bytetros" is MS-DOS, using the no-nonsense .COM format that has no headers or mandatory initialization at all. Also, in .COM files we only need a couple of bytes to obtain access to most of the vital things such as the graphics framebuffer. At the 16-byte size class, however, these "couples of bytes" quickly fill up the available space, leaving very little room for the actual substance. For example, here's a disassembly of a "TV noise" effect (by myself) in fifteen bytes:
addr  bytes     asm
0100 B0 13 MOV AL,13H
0102 CD 10 INT 10H
0104 68 00 A0 PUSH A000H
0107 07 POP ES
0108 11 C7 ADC DI,AX
010A 14 63 ADC AL,63H
010D EB F9 JMP 0108H

The first four lines, summing up to a total of eight bytes, initialize the popular 13h graphics mode (320x200 pixels with 256 colors) and set the segment register ES to point in the beginning of this framebuffer. While these bytes would be marginal in a 256-byte demo, they eat up a half of the available space in the 16-byte size class. Assuming that the infinite loop (requiring a JMP) and the "putpixel" (STOSB) are also part of the framework, we are only left with five (5) bytes to play around with! It is possible to find some interesting results besides TV noise, but it doesn't require many hours from the coder to get the feeling that there's nothing more left to explore.

What about other platforms, then? Practically all modern mainstream platforms and a considerable portion of older ones are out of the question because of the need for long headers and startup stubs. Some platforms, however, are very suitable for the 16-byte size class and even have considerable advantages over MS-DOS. The hardware registers of the Commodore 64, for example, are more readily accessible and can be manipulated in quite unorthodox ways without risking compatibility. This spares a lot of precious bytes compared to MS-DOS and thus opens a much wider space of possibilities for the artist to explore.

So, what is there to be found in the 16-byte possibility space? Is it all about raster effects, simple per-pixel formulas and glitches? Inferior and uglier versions of the things that have already made in 32 or 64 bytes? Is it possible to make a "killer demo" in sixteen bytes? A recent 23-byte Commodore 64 demo, Wallflower by 4mat of Ate Bit, suggests that this might be possible:

The most groundbreaking aspect in this demo is that it is not just a simple effect but appears to have a structure reminiscent of bigger demos. It even has an end. The structure is both musical and visual. The visuals are quite glitchy, but the music has a noticeable rhythm and macrostructure. Technically, this has been achieved by using the two lowest-order bytes of the system timer to calculate values that indicate how to manipulate the sound and video chip registers. The code of the demo follows:
* = $7c
ora $a2
and #$3f
sbc $a1
eor $a2
ora $a2
and #$7f
sta $d400,y
sta $cfd7,y
bvc $7c

When I looked into the code, I noticed that it is not very optimized. The line "eor $a2", for example, seems completely redundant. This inspired me to attempt a similar trick within the sixteen-byte limitation. I experimented with both C-64 and VIC-20, and here's something I came up with for the VIC-20:
* = $7c
lda $a1
eor $9004,x
ora $a2
sta $8ffe,x
bvc $7c

Sixteen bytes, including the two-byte PRG header. The visual side is not that interesting, but the musical output blew my mind when I first started the program in the emulator. Unfortunately, the demo doesn't work that well in real VIC-20s (due to an unemulated aspect of the I/O space). I used a real VIC-20 to come up with good-sounding alternatives, but this one is still the best I've been able to find. Here's an MP3 recording of the emulator output (with some equalization to silent out the the noisy low frequencies).

And no, I wasn't the only one who was inspired by Wallflower. Quite soon after it came out, some sceners came up with "ports" to ZX Spectrum (in 12 or 15 bytes + TAP header) and Atari XL (17 bytes of code + 6-byte header). However, I don't think they're as good in the esthetic sense as the original C-64 Wallflower.

So, how and why does it work? I haven't studied the ZX and XL versions, but here's what I've figured out of 4mat's original C-64 version and my VIC-20 experiment:

The layout of the zero page, which contains all kinds of system variables, is quite similar in VIC-20 and C-64. On both platforms, the byte at the address $A2 contains a counter that is incremented 60 times per second by the system timer interrupt. When this byte wraps over (every 256 steps), the byte at the address $A1 is incremented. This happens every 256/60 = 4.27 seconds, which is also the length of the basic macrostructural unit in both demos.

In music, especially in the rhythms and timings of Western pop music, binary structures are quite prominent. Oldschool homecomputer music takes advantage of this in order to maximize simplicity and efficiency: in a typical tracker song, for example, four rows comprise a beat, four beats (16 rows) comprise a bar, and four bars (64 rows) comprise a pattern, which is the basic building block for the high-level song structure. The macro-units in our demos correspond quite well to tracker patterns in terms of duration and number of beats.

The contents of the patterns, in both demos, are calculated using a formula that can be split into two parts: a "chaotic" part (which contains additions, XORs, feedbacks and bit rotations), and an "orderly" part (which, in both demos, contains an OR operation). The OR operation produces most of the basic rhythm, timbres and rising melody-like elements by forcing certain bits to 1 at the ends of patterns and smaller subunits. The chaotic part, on the other hand, introduces an unpredictable element that makes the output interesting.

It is almost a given that the outcomes of this approach are esthetically closer to glitch art than to the traditional "smooth" demoscene esthetics. Like in glitching and circuit-bending, hardware details have a very prominent effect in "Wallflower variants": a small change in register layout can cause a considerable difference in what the output of a given algorithm looks and sounds like. Demoscene esthetics is far from completely absent in "Wallflower variants", however. When the artist chooses the best candidate among countless of experiments, the judgement process strongly favors those programs that resemble actual demos and appear to squeeze a ridiculous amount of content in a low number of bytes.

When dealing with very short programs that escape straightforward rational understanding by appearing to outgrow their length, we are dealing with chaotic systems. Programs like this aren't anything new. The HAKMEM repository from the seventies provides several examples of short audiovisual hacks for the PDP-10 mainframe, and many of these are adaptations of earlier PDP-1 hacks, such as Munching Squares, dating back to the early sixties. Fractals, likewise producing a lot of detail from simple formulas, also fall under the label of chaotic systems.

When churning art out of mathematical chaos, be that fractal formulas or short machine-code programs, it is often easiest for the artist to just randomly try out all kinds of alternatives without attempting to understand the underlying logic. However, this easiness does not mean that there is no room for talent, technical progress or rational approach in the 16-byte size class. Random toying is just a characteristic of the first stages of discovery, and once a substantial set of easily discoverable programs have been found, I'm sure that it will become much more difficult to find new and groundbreaking ones.

Some years ago, I made a preliminary design for a virtual machine called "Extreme-Density Art Machine" (or EDAM for short). The primary purpose of this new platform was to facilitate the creation of extremely small demoscene productions by removing all the related problems and obstacles present in real-world platforms. There is no code/format overhead; even an empty file is a valid EDAM program that produces a visual result. There will be no ambiguities in the platform definition, no aspects of program execution that depend on the physical platform. The instruction lengths will be optimized specifically for visual effects and sound synthesis. I have been seriously thinking about reviving this project, especially now that there have been interesting excursions to the 16-byte possibility space. But I'll tell you more once I have something substantial to show.

Friday, 17 June 2011

We need a Pan-Hacker movement.

Some decades ago, computers weren't nearly as common as they are today. They were big and expensive, and access to them was very privileged. Still, there was a handful of people who had the chance to toy around with a computer in their leisure time and get a glimpse of what a total, personal access to a computer might be like. It was among these people, mostly students in MIT and similar facilities, where the computer hacker subculture was born.

The pioneering hackers felt that computers had changed their life for the better and therefore wanted to share this new improvement method with everyone else. They thought everyone should have an access to a computer, and not just any kind of access but an unlimited, non-institutionalized one. Something like a cheap personal computer, for example. Eventually, in the seventies, some adventurous hackers bootstrapped the personal computer industry, which led to the the so-called "microcomputer revolution" in the early eighties.

The era was filled with hopes and promises. All kinds of new possibilities were now at everyone's fingertips. It was assumed that programming would become a new form of literacy, something every citizen should be familiar with -- after all, using a computer to its fullest potential has always required programming skill. "Citizens' computer courses" were broadcasted on TV and radio, and parents bought cheap computers for their kids to ensure a bright future for the next generation. Some prophets even went far enough to suggest that personal computers could augment people's intellectual capacities or even expand their consciousnesses in the way how psychedelic drugs were thought to do.

In the nineties, however, reality stroke back. Selling a computer to everyone was apparently not enough for automatically turning them into superhuman creatures. As a matter of fact, digital technology actually seemed to dumb a lot of people down, making them helpless and dependent rather than liberating them. Hardware and software have become ever more complex, and it is already quite difficult to build reliable mental models about them or even be aware of all the automation that takes place. Instead of actually understanding and controlling their tools, people just make educated guesses about them and pray that everything works out right. We are increasingly dependent on digital technology but have less and less control over it.

So, what went wrong? Hackers opened the door to universal hackerdom, but the masses didn't enter. Are most people just too stupid for real technological awareness, or are the available paths to it too difficult or time-consuming? Is the industry deliberately trying to dumb people down with excessive complexity, or is it just impossible to make advanced technology any simpler to genuinely understand? In any case, the hacker movement has somewhat forgotten the idea of making digital technology more accessible to the masses. It's a pity, since the world needs this idea now more than ever. We need to give common people back the possibility to understand and master the technology they use. We need to let them ignore the wishes of the technological elite and regain the control of their own lives. We need a Pan-Hacker movement.

What does "Pan-Hacker" mean? I'll be giving three interpretations that I find equally relevant, emphasizing different aspects of the concept: "everyone can be a hacker", "everything can be hacked" and "all hackers together".

The first interpretation, "everyone can be a hacker", expands on the core idea of oldschool hackerdom, the idea of making technology as accessible as possible to as many as possible. The main issue is no longer the availability of technology, however, but the way how the various pieces of technology are designed and what kind of user cultures are formed around them. Ideally, technology should be designed so that it invites the user to seize the control, play around for fun and gradually develop an ever deeper understanding in a natural way. User cultures that encourage users to invent new tricks should be embraced and supported, and there should be different "paths of hackerdom" for all kinds of people with all kinds of interests and cognitive frameworks.

The second interpretation, "everything can be hacked", embraces the trend of extending the concept of hacking out of the technological zone. The generalized idea of hacking is relevant to all kinds of human activities, and all aspects of life are relevant to the principles of in-depth understanding and hands-on access. As the apparent complexity of the world is constantly increasing, it is particularly important to maintain and develop people's ability to understand the world and all kinds of things that affect their lives.

The third interpretation, "all hackers together", wants to eliminate the various schisms between the existing hacker subcultures and bring them into a fruitful co-operation. There is, for example, a popular text, Eric S. Raymond's "How To Become A Hacker", that represents a somewhat narrow-minded "orthodox hackerdom" that sees the free/open-source software culture as the only hacker culture that is worth contributing to. It frowns upon all non-academic hacker subcultures, especially the ones that use handles (such as the demoscene, which is my own primary reference point to hackerdom). We need to get rid of this kind of segregation and realize that there are many equally valid paths suitable for many kinds of minds and ambitions.

Now that I've mentioned the demoscene, I would like to add that all kinds of artworks and acts that bring people closer to the deep basics of technology are also important. I've been very glad about the increasing popularity of chip music and circuit-bending, for example. The Pan-Hacker movement should actively look for new ways of "showing off the bits" to different kinds of audiences in many kinds of diverse contexts.

I hope my writeup has given someone some food of thought. I would like to elaborate my philosophy even further and perhaps do some cartography on the existing "Pan-Hacker" activity, but perhaps I'll return to that at some later time. Before that, I'd like to hear your thoughts and visions about the idea. What kind of groups should I look into? What kind of projects could Pan-Hacker movement participate in? Is there still something we need to define or refine?