LiveCloud: Introduction to Database Design

Introduction to Database Design

In LiveCloud, each database you create is assigned a schema during the creation process. This schema is a set of keys which serve as the template for how data is stored in your database. You would then use these keys when storing data, performing queries, or in any other API in which you interact with your data.

During the creation process, you can create as many keys as you like, in as many databases as you like. How you break your data up, and decide which keys should be in which database, is the core of database design. Your database design can make your eventual implementation much simpler, or it can make it much, much harder.

LiveCloud Architecture

LiveCloud is designed with a high degree of flexibility when it comes to data modeling and database design. Like LiveCode itself, keys are not explicitly typed, and you can store whatever you like in any key. LiveCode arrays form the core of all LiveCloud data interaction. Whether you're making an API call, looking up a value, or storing a record, you'll be using arrays to do so.

In addition to the keys that you define yourself, there are some other built-in keys that you should familiarize yourself with. The most important of these is the cdbRecordID, which is a unique identifier assigned to every record. You can use it to access a record directly, without having to go through an API to do so. It is also used to update existing records, delete specific records, and various other common tasks.

All LiveCloud data is stored on disk, and then cached into RAM on startup. When you run the SDK loading code, this takes place automatically. This allows for fast reads and writes to your database, with changes then flushed to disk atomically as necessary. Of course, all data on disk is encrypted, as are all communications with the cloud database. This keeps your data secure, whether it's in the cloud, in your office, or on your clients' machines.

LiveCloud is also a hybrid database; this means that data can be stored locally, in the cloud, or both. These decisions are completely up to you, and may even vary from database to database. Keep accounting data secure on your local machine, while syncing your customer data to the cloud for your clients to access.

LiveCloud uses industry-standard technology like Apache, PHP, and OpenSSL to provide a robust and reliable cloud solution. By combining the best parts of LiveCode with rock-solid server technologies, LiveCloud offers a service which is stable, secure, and fast.

Data Modeling

Basic - Single Database

For applications that only require a basic level of database storage, a single database is often sufficient. A good example of this is Dropper, an example project for LiveCloud. It stores a player name and a score, without any kind of verification. This is conceptually similar to the score screen on a pinball machine.

Let's see if we can recreate this behavior in our own application. First, we'll go into the LCM and create a new database named HighScores with 3 keys: PlayerName, Score, and Date (we'll use the Date to break Score ties). Now, in our application, when a player finishes a game, we would use the APIs to create a new record with those three values:

put fld "player name" into tScoreA["PlayerName"]
put fld "score" into tScoreA["Score"]
put the internet date into tScoreA["Date"]
get cdb_SaveRecord(tScoreA, "HighScores")

It's as simple as that! Now, when you're creating a database, one of the most important things to consider is the speed with which your data will grow. In our example, we will generate one new record for each game played. Since we want all our players' to be ranked together, a single database is ideal for this situation.

However, we still have room for optimization elsewhere in our program. For example, let's say that our high scores list only has room for 50 entries. If we don't have a local copy of the high scores, we would do a cdb_SyncDatabase to get those records. However, after we've gotten those records, we can examine those records using a cdb_BasicLocalQuery, and use the results to optimize further.

For example, if the user plays a game, and their score is less than the 50th highest score, we don't even need to upload it, because it will never be seen. This allows us to keep our database much smaller than it otherwise would be. It will still grow slightly larger than 50, because borderline scores may be uploaded anyway when the decision algorithm doesn't have the latest data, but it will be pretty good. We could always update the data before we upload just to be sure, but then we wouldn't be saving any work!

Or, consider if we wanted to implement a second high score list with personal scores only. We could save those to the cloud, but if they are only ever meant to be viewed by a single user, we can choose to save them locally to save time. Our main High Score screen would show the cloud data (incorporating the user's data if he was in the top 50), and the Personal High Score screen would only use local data.

With these optimizations, we can optimize for responsiveness and efficiency, while still offering our users more functionality as well. This is just one example of how LiveCloud's flexibility puts you in complete control of your data, and more importantly, your users' performance.

If you'd like to learn more, you can download the full Dropper source here.

Advanced - Multiple Databases, Linked Keys

While a single database can be adequate for some applications, most projects that require management of larger or more diverse sets of data will require some extra organizational planning. For our example, let's consider a mobile application for conference attendees. It will allow our users to sign up for an account, view a schedule of events, and post text and image comments to a social feed viewable to all attendees.

First, we need to determine how we can best store this data. We'll need to break down our app into its main module, and then for each module, ask ourselves these questions:
1. At what rate will this module generate data?

2. Is the data produced by this module used by a different module?
3. How frequently will we need the data created by this module?

Let's start by looking at our Account module, which will cover registration, login, and authentication. First, at what rate will this module generate data? Well, it will generate at most 1 record for each user of our application. This is relatively slow-growing, which is nice. What about login? Well, each user will login at least once, and possibly several times, but not more than a constant factor of times per user. Again, this is pretty good.

Second, are we going to use this data elsewhere? Yes, in this case, we are going to use our Account to associate text and image posts in the comments section with an account. How often will we need to access this data for this module? We'll need to store the current user's data whenever they make a new post, and we'll also need to access the user data of other posters when our user reads those posts, so we can display information about the author.

Finally, how frequently will we need the data created by this module? Well, we'll need to access the records of the people posting comments as often as people read comments. We'll examine that in more detail later.

So, we can see that there are interconnections between our first module, Account, and our second module, Comments. In fact, Comments may require quite frequent access of the Account database. Keeping this in mind, let's repeat our analysis for the Comments module.

At what rate will Comments generate data? Each user might not post anything, but they could potentially each post an infinite number of comments. In fact, this data is probably going to grow more quickly than any other set of data in our application.

Is the Comments data used elsewhere? Not really - the Comments show our User Account data, but we don't have a feature for people to look up comments by user account. This will save us a some time - our data only links modules in one direction.

How frequently will we need the Comments data? Depends on how often our users view the Comments - probably quite often, since the rest of the data is mostly static. Also, since we need to access Account info when reading comments, that will also be read quite frequently.

Finally, we have our last module, Schedule. How often does schedule create data? Very infrequently; in fact, it will probably be static for most of the conference, barring minor changes. Is this data used in a different module? No, not really - comments and Account exist independent of the schedule.

So, based on our analysis, what conclusions can we draw?

1. Schedule will need its own database, because it is highly separate from the other data.
2. We probably don't want to put Comments and Accounts in the same database, because they grow at different speeds.
3. We'll still need to link Comments and Account somehow.

So, our first step is pretty simple - we just create three new databases, one for each of our features. Our next move is slightly more complex; we need to conceptually link our Comments and Accounts database. We'll want to make sure we have a "PostedBy" key in our Comments database, which will contain a cdbRecordID pointing to a record in our Accounts database.

This key is how we will conceptually link our two databases. This is similar to, although not quite the same as, a foreign key in SQL. We could use that key to store a single cdbRecordID, for a 1-to-1 relationship between Accounts and Comments, or we could store multiple records if that made sense in our application. For example, if we had Conversations instead of Comments, a 1-to-many relationship might make more sense.

So, now that we have this special key, how do we use it? Let's take a look:

//user logs in, and populates a global with their account data
//...
//user posts a new comment
put fld "comment input" into tCommentA["Content"]
put gUserDataA["cdbRecordID"] into tCommentA["PostedBy"]
put the internet date into tCommentA["TimePosted"]
get cdb_SaveRecord(tCommentA, "Comments")

We gather the comment itself, and then we associate the cdbRecordID of our user, which we've gathered earlier, and include that in our PostedBy key. Then, when we want to display a comment:

//cache comments & user data locally
cdb_SyncDatabase "Comments"
cdb_SyncDatabase "Accounts"

//expand all comments which happened within the last hour
put cdb_BuildQuery("TimePosted","date>",(the seconds - 3600)) \
      into tQueryA
put cdb_BasicLocalQuery(tQueryA,"Comments","Basic") into \
      tAllCommentsA

//display the comments
repeat for each key tCommentA in tAllCommentsA
   put tCommentA["Content"] into fld "comment"
   put tCommentA["TimePosted"] into fld "date"
   
   //use our linking key to fetch the poster's data
   put cdb_ReadLocalRecord(tCommentA["PostedBy"],"Accounts") \
         into tUserDataA
   put tUserDataA["Name"] into fld "author" //... and display it
end repeat

Notice that we had to go fetch our linked data explicitly; this is the default behavior of LiveCloud. The biggest benefit of explicit access is a massive performance improvement over automatic links/lookups. Of course, the biggest downside is that you need to decide which data is required to cache locally, or a larger number of atomic cloud calls if not caching.

This functionality provides you with the flexibility to create extremely sophisticated and interconnected databases, without burdening you with the overhead required to maintain those links that other, more traditional relational databases bring with them.