7.0 Goes Into Testing

LiveCode 7.0 has just gone into its first public test release. Why should you be excited?

The Unicorn is Tamed

There are lots of reasons. If you've been with us for a while, and especially if you use a language that is not "plain" English, then the news that the unicorn of Unicode has finally been tamed will be the best news you've heard this year. Right to left text, seamless handling of accents, Cyrillic or Chinese alphabets, and everything else implied by Unicode means LiveCode is ready for a big expansion into Asian, Arabic, Eastern and other markets - not to mention making French, Spanish, Italian and many other European languages far easier to support.

The Monolith is Broken

But the hidden news in this release is that it represents if not the full "Next Generation" release promised by our Kickstarter campaign, at least the biggest chunk of it. This engine has been refactored to a very large degree, starting to bring the modularisation and modernisation we need to build for the future. You may not immediately see any dramatic benefit. LiveCode still works "just as before". Only, it doesn't. It's working in brand new ways to produce the same effect, and this means that now we - and you - can really start to deliver. The engine is no longer a 20 year old monolith. It is in the process of being rejuvenated.

Goodbye, Butterfly

The big idea of modularization is that an Open Source community can work on self contained parts of the engine without the "butterfly effect" causing unexpected and unwanted behavior in other areas. This should mean that the rate of progress both from you, the community, working on the engine and our own dedicated team working on all the features you want, can increase dramatically. We're not quite there yet, but we've got a long long way. Future releases will be building on the huge amounts of work that have already been done here.

Enough preamble... what exactly is new for you in this test release of 7.0?

A Warning: The file format has changed. Back up your stacks and expect the unexpected. This is an early test release and if you don't encounter any issues, you're not really trying.

Unicode for You

In LiveCode 7.0, the engine has been extensively re-written to be able to handle Unicode text transparently throughout. The standard text manipulation operations work on Unicode text without any additional effort on your part; Unicode text can now be used to name controls, stacks and other objects; menus containing Unicode selections no longer require tags to be usable - anywhere text is used, Unicode should work.

New Syntax and Functions

We're bringing you quite a few new terms to deal with the new unicode abilities. I don't want to reprint the release notes here, but I do highly recommend you read them. There is a nice overview of what unicode is and why and how LiveCode had to change its handling of it, followed by detailed notes on exactly how to use it. Here is a sample:

Chunk expressions: byte, char, codepoint, codeunit

byte x to y of text -- Returns bytes from a binary string
char x to y of text -- As a series of graphical units
codepoint x to y of text -- As a series of Unicode codepoints
codeunit x to y of text -- As a series of UTF-16 codeunits

The key change is that byte and char are no longer synonyms - a byte is strictly an 8-bit unit and can only be reliably used with binary data. For backwards compatibility, it returns the corresponding native character from Unicode text (or a '?' if not representable) but this behaviour is deprecated and should not be used in new code.

The char chunk type no longer means an 8-bit unit but instead refers to what would naturally be thought of as a single graphical character (even if it is composed of multiple sub-units, as in some accented text or Korean ideographs). Because of this change, it is inappropriate to use this type of chunk expression on binary data.

The codepoint chunk type allows access to the sequence of Unicode codepoints which make up the string. This allows direct access to the components that make up a character. For example, á can be encoded as (a,combining-acute-accent) so it is one character, but two codepoints (the two codepoints being a and combining-acute-accent). The codeunit chunk type allows direct access to the UTF-16 code-units which notionally make up the internal storage of strings. The codeunit and codepoint chunk are the same if a string only contains unicode codepoints from the Basic Multilingual Plane. If, however, the string contains unicode codepoints from the Supplementary Planes, then such codepoints are represented as two codeunits (via the surrogate pair mechanism).

The most important feature of the 'codeunit' chunk is that it guarantees constant time indexed access into a string (just as char did in previous engines) however it is not of general utility and should be reserved for use in scripts which need greater speed but do not need to process Supplmentary Plane characters, or are able to do such processing themselves. The hierarchy of these new and altered chunk types is as follows: byte w of codeunit x of codepoint y of char z of word...

Read the rest of the release notes here (and don't miss the important information about numToChar and charToNum).

Don't miss Ali Lloyd's excellent blog post on all the new chunk types in 7. Along with these chunks he talks about the new sentence, paragraph and trueWord chunks.

We really need your involvement at this stage. We need you, yes you, to download 7.0 dp1 and test it. Then tell us what doesn't work by reporting all bugs in the Quality Center. We want this release to be solid, stable, and deliver everything it promises. To do that it needs as much real world testing as it can get. We've been testing it for some months now. A tiny group of early outside testers have had their go at it. Now it's your turn.