Random Set of Records (optimized)

September 29, 2011

Dice

I noticed that one of the articles updated in the official FileMaker Knowledge Base on September 23, 2011 was explaining how to select a random set of records in a FileMaker database. I was wondering how fast the currently recommended technique is and whether I can make it faster with the help of FM Bench. I tried to apply it to a set of 50,000 records of randomly generated data. Then I made it much faster

This is the script recommended by the knowledge base article:

  • Show All Records
  • Replace Field Contents [No Dialog, MyFile::RandomField, Random]
  • Sort Records by RandomField [No Dialog]
  • Go to Record/Request/Page [First]
  • Omit Multiple Records [No Dialog, 10]
  • Show Omitted Only

I created an example to try this script out. Actually, you can download the example below and try it out yourself. As you can see this script selects a random set of 10 records from the current table.

When I tried to apply the script to a table containing 50,000 records of generated test data, it took 23 seconds to select the random set. This was the first time I ran the script after opening the file. Every following run took nearly 8 seconds, most likely because of some index caching.

Random Records - FileMaker Knowledge Base version result

Actualy, I originally tried to apply the script to a much smaller set of data, about 5,000 record, and it was taking aroudn 1 second. That was pretty acceptable but I really wanted to optimize it, so I was looking for circumstances under which it would fail…

I discovered that the Replace Field Contents script step can be much faster than I expected. But however fast it is, it is the weakest point of this script. Note that regardless of how many random records we want to select, this script always goes through all available records and modifies them. Then it again has to go through all records to sort them, although it is done internally by FileMaker Pro by executing the Sort Records script step.

My first thought was: Do I really ever have to touch other records than those I want to have in the final found set?

Probably the fastest way to select one random record is using the Go To Record/Request/Page script step, calculating a random record number to jump to. So I based my optimized script on this technique. I simply use this script step as many times as I need to select the requested number of records:

  • Allow User Abort [ Off ]
  • Show All Records
  • Set Variable [ $targetfoundcount; Value:Get(FoundCount) - RandomRecords::SetSize ]
  • Loop
    • Exit Loop If [ Get(FoundCount) ≤ $targetfoundcount ]
    • Go to Record/Request/Page [ Int(Random * Get(FoundCount)) + 1 ] [ No dialog ]
    • Set Field [ RandomRecords::RandomField; Random ]
    • Omit Record
  • End Loop
  • Show Omitted Only
  • Sort Records [ Specified Sort Order: RandomRecords::RandomField; ascending ] [ Restore; No dialog ]
  • Go to Record/Request/Page [ First ]

This script takes less than 50 milliseconds to select 10 records out of 50,000. That’s at least 158 times faster!

Random Records - optimized version result

Looks like a magic, but it isn’t. When you try to select larger set of random records in my example, you’ll discover that somewhere between 1,500 and 2,500 my optimized script starts taking longer than the original script from the knowledge base. Why is that?

The original script’s execution time depends on the total number of records in the table. So it takes the same amount of time regardless of how many records you want to select. My optimized script directly depends on the requested number of records to select, so it is extremely fast in comparison with the original script especially when selecting small number of records from a large number of records. But beacause it goes through the individual records manually, while the original script loops through them internally in FileMaker Pro (inside the Replace Field Contents and Sort Records script steps), my script spends significantly more time per record.

So, not only is this yet another marvelous optimization example, but it also proves that a solution that’s slower in one situation, can be actually faster in another situation.

Try it yourself…

Download Sample fp7 File (zipped 23.3 MB)

…and let me know your thoughts and questions in the Comments section below.

{ 19 comments… read them below or add one }

Agnes Riley September 29, 2011 at 6:15 am

Great article, HOnza, thanks so much!

I was always wondering about random records. I had to select random people for giving away the free scanner at devcon. While it wasn’t a big deal, because of our found set, I’m sure we would’ve liked to have read this article first.

There are a lot of cases when you need to generate a random set of records. Optimization is a great thing, and I think greatly undervalued. A lot of times we, developers, do not have the time to go through a finished solution and optimize it unless there’s a complaint that we react to. FMBench will hopefully help us be more proactive.

Looking forward to trying it on a solution.

Reply

HOnza September 29, 2011 at 6:52 am

I am glad it’s so useful ;-)

BTW, when going to DevCon this year I got a “cool” idea to choose a winner of our giveaway using a FileMaker Go solution running on iPad. So I announced that on Twitter. Then, in the plane, I created a solution for that and it looked really well. Until I tried it on iPad and discovered that it would take about 6 thousand hours to select the winner using the script I had created.

So I had to optimize… :-)

Jack Rodgers September 29, 2011 at 6:20 am

Have you tested using Extend Found Set?

loop
Such as Find record Number Random adjusted for 1 to 50000
Extend found set
end loop

You can link two tables one with the numbered ID field and the other with a text field filled with numbers which can then be used to GTTR the first table. Try filling this text field with your random ids and calculate the GTTR speed.

Reply

HOnza September 29, 2011 at 6:46 am

I considered using GTRR but decided to not use it for two reasons:

1) It requires an indexed ID field
2) It’s harder to reliably choose a random record that has not been chosen yet – you either have to use GetNthRecord on a relationship that filters out the already chosen records, or find another way how to remove the already chosen records from the list of records available to choose from

Jack Rodgers September 29, 2011 at 6:25 am

One other point, on a network Replace Field Contents will not work on any records being modified and any loop relying on set field will not work since the records are locked.

So any solution relying on setting a value in a field on a network will not reliably include all records that are locked and so if the results may show less than required. The two methods above will include locked records but of course they aren’t editable.

Reply

HOnza September 29, 2011 at 6:49 am

That’s a good point. I used the Set Field only to also return the selected records in random order. You don’t have to do that if you don’t need random order. Also, as an alternative, you can use an unstored calculation returning Random to sort the resulting found set.

Matt Petrowsky September 29, 2011 at 10:55 am

Yeah HOnza, great article if purely for the purpose of showing how you could evaluate the logic of your intended process and try to figure out how it can be more efficient.

Reply

Greg Lane September 29, 2011 at 11:34 am

Using an unstored calculation field (returning Random) instead of a number field would significantly improve the script used in the knowledge base article. It avoids modifying all of the records and it’s much faster than using Replace Field Contents.

By the way, in addition to the multi-user issue Jack pointed out, Show Omitted Only is often a poor choice in a multi-user setting. Show Omitted Only will show records that have been created by other users since the time the current user omitted one or more records (or otherwise created a found set). In other words, the new records created by other users are added to the current user’s omitted set. This can lead to the final found set including more than the expected number of records in both of the scripts above. A simple solution for the knowledge base article script would be to change the Omit Multiple Records value to “Get(FoundCount) – 10″ and then remove the Show Omitted Only script step.

HOnza, how would you modify your script to address this Show Omitted Only issue in a multi-user environment? You could omit “Get(FoundCount) – RandomRecords::SetSize” records in your loop, but that would destroy the performance benefit for small values of RandomRecords::SetSize. Other options seem to add a lot of complexity to the script.

Reply

HOnza September 29, 2011 at 12:07 pm

Multi-user state instability is a great point. I would add that besides creating records, other users can also delete records.

If only one of these things happens, then the solution is relatively easy.

Deletion of records can be recalculating the target found count in every iteration of the loop instead of storing it into a variable.

Newly created records can be resolved by NOT recalculating the Get(FoundCount) in every loop iteration and counting the iterations instead, then checking if the target found count is higher than it should be, and omitting the redundant records from the end of an unstored found set (newly created records will be always at the end when the records are unsorted.

The real challenge is handling both situations at the same time – what if one user creates new records and another user deletes records?

Jack Rodgers September 30, 2011 at 4:59 am

A thought on Filemaker Inc’s examples:
The files are clean and have a nice look and scripts submitted like the example given are for new users in single user mode. They do not consider the problems to be encountered in multi-user mode nor offer consideration for problems such as locking, new records added, deleted and so on such as mentioned here. Unless the topic deals with those specific issues. Perhaps Filemaker should include those factors along with the examples or scripts. Such as:

Mode: Single User
View: __list __form __table __preview
Windows Open: 1
Record Mode: __committed __edit
…etc

Reply

Jack Rodgers September 30, 2011 at 12:51 pm

Excuse me if I double posted this:

http://wp.me/1AmGx

1000 random records selected in less than 1 second…

Thanks for starting this post and costing me an afternoon… :)

Reply

HOnza September 30, 2011 at 1:06 pm

Great! I love to inspire others ;-) Good job!

HOnza September 30, 2011 at 1:23 pm

Hey, I also like challenges…

I have just checked your solution and I see 2 issues with it:

1) Although it is usually unlikely, Random can return the same ID twice if you try large number of iterations. The closer your random set size is to the size of the whole database, the more likely this will happen.

2) Your method depends on the fact that the record set you’re selecting from is numbered 1 to 50,000. But how do you get the IDs in there? That requires extra processing which you did not include in your measurement.

Are you able to address these issues while still keeping your approach faster than any other? ;-)

Jason DeLooze October 7, 2011 at 8:11 pm

The “unstored calculation field” advice from Greg will greatly enhance the performance of the KB script. Additionally, your optimized script can be further enhanced by switching to “View as [Form]” (a posiblybefore the loop, then switching back to “View As [List]” upon script completion. Viewing as a List causes the record list to be redrawn (even with a Freeze Window) if you are going to, then omitting records. Try this simple change to see your script work much better on large selected sets.

I’m not certain how FM Bench would help a user optimize their scripts. Certainly it will show relative execution speeds, but knowing HOW to change a script to make it faster comes only with experience – in my opinion.

Reply

HOnza October 7, 2011 at 11:19 pm

Good tips, Jason thanks!
To your questiob – even with a lot of experience you still often have to compare multiple ways to do the same thing. In this case, FM Bench helped me to do this comparison quickly and easily.

Arnold Kegebein October 12, 2011 at 5:39 am

Thank you for your elaboration of optimizing FileMaker databases. Reading your article some ideas jumped into my mind (some already mentioned by other readers).
• Freeze Window will often create a better performance.
• Switching to Form View at the beginning and back to List View at the end of the script will make the script run faster. Switching to an empty layout without any objects will do that too.
• Replace Field Contents will change the timestamp of the record – not recommended in an environment where you keep track when the record was changed.

To overcome problems with multi-user environments, you have to use global fields. Unfortunately the FileMaker KB database does not use an unique id field (unforgivable!). With such uid field, a global field and a self relationship you can enhance your solution. Before omitting the selected records, add their uid to a global field. At the end use GTRR instead of Show Omitted Only to display. Of course, this will cost more time for the additional writing into the global field.

To tackle the time problem with large sample sizes for your solution you might think about the meaning of “random”. How random do you want the sample to be? Perhaps it is random enough to pull more than one record in every script loop. For example, with sample size larger than 1000, omit ten records at a time. This will require only 1/10th of loops.

Reply

HOnza October 12, 2011 at 8:20 pm

These are very good points, Arnold. Thanks.

I like the idea of still omitting records but also saving their idsand then using GTRR as a way to make it more multi-user friendly.

One thing that comes to my mind for the large data samples is that once the sample is going to be larger than half of the total records count, we can hide random records instead of finding random records.

rob JLT Lewis February 1, 2012 at 11:45 am

GOOD DAY ALL

i found your conversations very interesting touching on the True Nature of Randomization. Haven’t grasp the depth of it all yet . Although over several years i have been evaluating, “What Is The Basic Fundamental Principles of True Randomization” so over these years i been developing “The I.A.M PROGRAM which uses True Randomization Principles, which i feel any random system requires the proper use of having two scopes of numbers — two fields should always be used which incorporates my, “IAM Random Formula”, that represents Logical Order For Random Spaces.

But i am still stuck with trying to understand a simplicity:
i been using Loop statements in scripts for reordering a number field. In my found sets, the Loop simply sets the record number along with a id letter ex: 1A; 2A; 3A; 4A; 5A; 6A, and so forth.

But now i am wondering hey is there a better way to do this without using Loop but with using Find Replace ???????

thanks,

from
r o b
HeartLand USA
Wawaka Indiana

Mac 10.6
FileMaker Advance 11

Reply

HOnza February 3, 2012 at 4:46 pm

To be honest I don’t understand what you’re trying to achieve and why. Are you trying to replace value of one field in all found records with a calculated new value? If you are then the key question is why? Do you really need to modify all records?

Leave a Comment