FileMaker Script Execution Time Cut From 5 Hours To 6 Seconds

March 17, 2011

Sometimes, not often, I accidentally achieve results that look too marvelous to be real. But they are. Just like in this case.

Last week I assigned Petr, our internal system developer, the task to optimize one server-side script. The script recently started taking over 8 hours to run and reaching the expiration limit we have set for it. When I tried his optimized script on live data I was honestly surprised. It took less than 2 minutes to run (over network).

My first thought was there must be something wrong. But the script was returning correct results. After further examination I could confirm that it really does what it is supposed to do.

Let me show you what and how Petr actually optimized…

 

A Little Background

This looked like an easy request a few months ago. We were supposed to implement report of stock inventory at a specified time. We had hundreds of items on stock and thousands of items historically in our system. We have a pretty generic inventory data model: we have a table of goods and a table of transactions.

To speed the report up we used auto-enters to keep the updated quantity on stock in each transaction record. But to be able to pull this information for any timestamp in history, we had to create a LAST flag defined as an unstored calc using relationship to previous transaction and comparing transaction timestamp to the report timestamp, whose result was 1 if it was the last transaction for the specific item with older or equal timestamp to the report timestamp.

Geneating the stock report this way originally took around 40 minutes, 35 minutes of which was performing find in the unstored calc field.

 

First Optimization

We never know in advance what time we will be generating the report for. So we thought there was no way to pre-calculated the results over night. But after brainstorming this issue in our team we came up with a solution. We created a new field DatesOnStock in the transactions table. It’s a regular text field because date field cannot hold more than one date. Every first day in a month, we ran a script over night which found the last transaction for every item with positive quantity currently on stock. Then in every found record it added the current date to the DatesOnStock field.

For example, if a specific item was received on January 5, 2010 and then there was no transaction until June 20, 2010, the DatesOnStock field in the receiving transaction from January 5 will finally end up containing all these dates:

2/1/2010
3/1/2010
4/1/2010
5/1/2010
6/1/2010

Now when we wanted to report items on stock on March 3, 2010 at 12:00 we performed this optimized find:

  1. Find all transactions with the date 3/1/2010 and all transactions whose timestamp is between 3/1/2010 0:00 and 3/3/2010 12:00
  2. Constrain found set to transactions with 1 in the LAST flag

The report which had been originally taking around 40 minutes and getting slower as we were adding data now took around 40 seconds and its speed did not depend on the number of obsolete transactions stored in the system. 60 times faster reports and most of the workload done only once a month over night. Not bad, right?

Then we did a physical inventory counting and generated a lot of transactions in a single day, and a new episode of the story began…

 

The Killer Optimization

The physical inventory revealed something that was already happening but we had not noticed yet. Both the over-night script and the report script were quickly slowing down as we were adding goods and transactions. On March 1 the server-side script did not finish because it exceeded the time limit we had set for it (you can set a time limit for a server-side script and FileMaker Server will halt the script when it exceeds the limit).

To verify why the script did not complete server-side I pulled it off the server and tried to run it locally. To measure the execution time I stored the start and end timestamps of the script in global variables.

This confirmed my fear: 5 hours were really bad for a locally executed script. So we had to optimize the script further but didn’t have a clue how. Until we dicovered another issue…

Having around 50 thousand records in the transactions table, the find performed on the LAST flag turned out to be not only slow but also unreliable (at least when done over network). We discovered that it did not find all the records having 1in the LAST field. Even weirder, the resulting found set also contained a few items with 0 in the field! And what was worst, it did not return the same found set when we performed the find multiple times. We are now trying to describe the issue and prepare a test solution demonstrating this so that we can report it as a bug to FileMaker.

In the meantime, of course, we had to make our system work. So I told Petr: “Rewrite the scripts to not use the unstored calc even if you will have to walk through all the 50 thousand records in a loop and let’s hope the solution you find won’t be slower than the current broken one.

The result was unbelievable.

Petr cut the execution time of the script from 5 hours 11 minutes and 33 seconds to just 6 seconds.

That’s a 311 thousand percent speed increase!

And guess what, the new script really does go through all the 58,525 transaction records in a loop.

See what the two scripts look like:

Doesn’t it look like a miracle?

 

The Marvelous Optimization Formula

OK, let’s add a bit of self-reflection. I have discovered something I call “Marvelous Optimization Formula”. It works like this:

  • The more marvelous it appears the less you believe it is the right way.
  • The less you believe it is the right way the more probably you’re going to do it the stupid way.
  • The more stupid way you do it the more marvelous the right way appears.

Got it? The main reason this optimization looks so marvelous is that the original approach was so much more stupid.

Just think about these questions for a moment:

  • Considering how we defined the LAST flag calculation, what does it take to evaluate it for a single transaction?
  • Now what does it take to evaluate it for two transactions with the same PN and Stock ID?
  • How about 3 transactions of the same PN and Stock ID?

Now consider this: we have 58,525 transactions and only 4,709 PNs. That’s 12 transactions per PN in average, but the reality is that most of the PN’s have zero or one transaction, so it’s many more. You have probably already concluded that the execution time grows exponentially with the number of transactiosn per PN…

The reason we used this stupid approach was even more stupid. I tend to think that scripts are in general slower than calculations. So when I first discovered I need to find every transaction which is “last in specified time period for the specified PN and Stock ID” my next thought was: “how do I find out if this specific transaction is the last one for this PN and Stock ID in the specified period?” And I turned this question to a calculation.

Then we created the report script using this calc and it took about 40 minutes to run. Long enough to make the right way look too marvelous to even think about it…

 

The lesson I have learned

The Marvelous Optimization Formula works. People will always keep doing things the stupid way. Therefore there will always be a chance for a marvelous optimization when something starts becoming useless. And the fix often is as easy as replacing the stupid solution with the normal one. Not necessarily genius, just normal.

You know, we still have to optimize our solution a little bit, for example by taking the previous values in the DatesOnStock field into account and not walking through all transactions every month if it is enough to walk through the new ones only.

And by the way, I have to remember that scripts don’t necessarily have to be slow when it is possible to loop through 50 thousand records in 6 seconds

So if you have a FileMaker solution that’s extremely slow, try to ask yourself: am I not just doing something the stupid way?

 

 

How about you?

Was this helpful? Do you like knowing that marvelous optimizations are possible? Do you have a similar experience? Please leave me a comment below. I would love to find out about other examples.

I am also preparing a stripped down version of our database to share with my mailing list subscribers. So if you’re interested in examining it yourself, just enter your name and e-mail in the blue box on the right side of this page, and I’ll let you know when I have it ready for you.

Enjoy your FileMaker development and surprise your customers with marvelous optimizations… ;-)

 

{ 31 comments… read them below or add one }

Court Bowman March 17, 2011 at 4:15 pm

Great writeup, it really shows the breadth of performance you get get from bad to fantastic.Well done.

Reply

Paul Spafford March 17, 2011 at 4:28 pm

The reason we used this stupid approach was even more stupid. I tend to think that scripts are in general slower than calculations.

Reply

russkohn March 17, 2011 at 4:29 pm

Nice job Honza! Even more than the technique, I appreciate your detailing the thought process..

Reply

Paul Spafford March 17, 2011 at 4:31 pm

Oops, my comment got cut off.That’s a stupid assumption that I also have to challenge on a regular basis. Thanks for the reminder!

Reply

HOnza Koudelka March 17, 2011 at 4:59 pm

Thanks for the quick and nice comments. I am glad you like it. BTW, I am going to do more with speed optimizations, so if you have any tips, ideas, examples, anything, I will greatly appreciate it.

Reply

Matt Petrowsky March 17, 2011 at 6:27 pm

While a script may often be faster than using calcs via relationships (ESPECIALLY WHEN UNSTORED!!! – avoid unstored like the plague), there are many times when doing straight forward data operations within variables will be EVEN FASTER. For example, pulling indexed values via a List function (even 50-500k) then pushing into a $variable and running a data loop to determine applicable key values or whatever can be faster than running a scripted record loop.Following up with a simple GTRR (Go To Related Record) can pull out the results you need. This may not apply to this situation, but it should always be considered. Here’s my typical thought process with any operation.Basis: Slower -> FasterUnstored calcs -> Indexed calcs/fieldsSame table -> Related tableRecord loops -> Data loopsBonus: always remember that Freeze window step for record loops! You’ll save many seconds from their ultimate demise in “wasted time land”. ;) Matt PetrowskyFor more information about my work with FileMaker, visit my dedicated web site for FileMaker Tutorials

Reply

Bruce Robertson March 17, 2011 at 6:38 pm

Very much agree on Matt’s comment about reading the data into variables and then operating on it. There are many variations on this, including reading into one big list as he suggested; or looping though records and reading into $var[N] where N is the record number. In addition to freeze window, go to form view before walking the records. I am also looking at repeats and especially global repeats for this. They are fast, they are addressable, and a repeat can contain anything – such as values with embedded returns. I’m also curious about using container fields (blobs really) since you can set and retrieve strings into containers and containers are not part of the index-passing traffic.

Reply

HOnza Koudelka March 17, 2011 at 6:54 pm

Matt: Great extension of the thought process!Bruce: Original view angle for considering containers. Do you think that would be even more efficient than simply disabling indexing for specific fields?

Reply

Bruce Robertson March 17, 2011 at 7:32 pm

Don’t know one way or the other yet about efficiency. There are other useful properties, like the whole block of text is UTF8, it is unstyled, you can drop it as a block into another text field, etc. I am curious how Filemaker handles blobs and indexing because it DOES know that if you have put the same graphic in two records it only retains one copy of the graphic. I think repeats and containers are somewhat unexplored territory.

Reply

matthewgreger March 17, 2011 at 8:14 pm

Nice write up and I have similar optimizations I’ve been working with. Will try to interject more on that later.I think your Find issues are a bug in FileMaker Server. There seems to be two known issues with memory leaks, especially running multiple Server-Side Scripts. I don’t know the specifics, but one does cause anomalies when performing finds. The only solution seems to be frequent restarts of the FMS machine (like once a week). Did you have the same problems opened locally or only on FMS?

Reply

HOnza Koudelka March 17, 2011 at 8:29 pm

Bruce: Interesting! What if you paste the graphic through clipboard? And what if you then use a plug-in to modify one copy?Matthew: I could not reliably reproduce it. The only cases I know about were all in the situation when the find was performed from FMP on a database hosted on FMS. In all cases FMS was version 11, FMP was 10 in one case and 11 in another case.

Reply

Daniel Wood March 17, 2011 at 9:16 pm

Great article HOnza, very interesting indeed! It’s incredible that 50k + can be looped in that amount of time, and on top of that from what I can see you are also doing an IF check, AND passing values thru a uniquevalue custom function too!I can understand why doing a find on a complex unstored calc is a very time consuming process – it probably required for every single transaction record to be downloaded from server to client to begin with, and then processing begins, but there would be huge amounts of back & forth of data between client and server, ugh !It does make me wonder though why FileMaker allow finds on unstored calcs, yet relationships – which are basically another form of a find – do not allow it. there are times when it would be quite nice to have a relationship on an unstored calc on the destination side, especially if you know the related record count is low, and there will be little performance hit. It seems if Find script step can do it, and find mode can do it, then why shouldn’t a relationship – at your own risk of course ! Perhaps it should be a File preference “Allow relationships based on unindexed fields” :) —– The calculation to determine the last transaction sounds very similar to one I use often which instead is a flag to determine the first (or last) record in a table that has a specific value in a field, basically it is used to flag the records that have the first occurrence of a unique value in a given field, thus letting me do a find on the flag field to find all unique records in a set. As an example if there are 10,000 records in the table, but only 5 unique values in a given field, I may wish to have a portal that just shows 5 rows, 1 per unique value – then in each row I can put a calc showing total for that value or something like that… Anyways, the calc in question has to evaluate for every record & compares its ID to the first ID thru a relationship matching on the field in question. It’s a real killer because every record has to evaluate this relationship. I’d love to find a real fast way to achieve this uniqueness flag, preferably without scripting—-Another favorite of mine is using the Get(Foundcount) function thru a relationship to count the records via that relationship instead of using Count function, this is another example where performance can be increased many thousands of % depending on the # of records. the count function has to download all the related records & then validate each one (count only does valid records), and add it to the total. Get(FoundCount) purely returns the # of related records and is instant – no record downloading required.Container fields used for graphics in a solution perform many times faster if those fields are global than if they are non-global – especially when they are placed in a portal either directly, or as the result of a calculation on the given table. Globals give smoother portal scrolling, non-globals cause large amounts of network traffic back & forth between server/client (not sure whats going on there) but end result is slower scrolling, its’ almost as if the container is sent back & forth between server & client for each record, even tho you would expect it to have been cached.Looping in lists that are set in a variable as Matt mentioned – much faster to grab values from that list for each iteration of the loop using GetValue and obtaining the value using a counter, than obtaining just its first value and then reducing the size of the list via RightValues – talking 2/3x faster.

Reply

HOnza Koudelka March 17, 2011 at 9:42 pm

Wow, what a long comment, Daniel :-) Some very interesting points. I would definitely suggest submitting 2 or 3 feature request to FMI formally…But the Get(FoundCount) used through a relationship – that’s awesome! Haven’t ever think it could make sense to use this function in a related record. Thanks!

Reply

HOnza Koudelka March 18, 2011 at 9:14 am

Important comment from @FabriceN: Be careful with that, it works only for a given context. (foundcount)Thanks!

Reply

Daniel Wood March 18, 2011 at 9:27 am

Not sure I understand that comment. The Get(Foundcount) method gives the same results as the usage of the Count function – context doesn’t enter into it. Simply put an unstored calc on a table you want to count related records in, set it to Get(FoundCount). You then can reference that field via any relationship to return the # of related records through the realtionship, from any context, it doesn’t matter. It also doesn’t matter what context you specify the calculation itself to evaluate from. Maybe I’m missing something, but the comment may need to be clarified?

Reply

Vincenzo Menanno March 18, 2011 at 4:14 pm

Hi HOnza, this is great thanks for posting. At Pause on Error, in the session I presented “FileMaker User Experience for Developers” I talked about the need that we have for ways of spotting these issues before they become bigger problems. One thought I had was what if there was another columns in the script debugger that showed you how long that step took to complete. So as you step through your script you can see exactly how long each step took to execute. So as you step through your script and lets say you are iterating through a loop you might notice that a step took .8 seconds to complete … You might not even notice that it is slow because .8 is about as long as it takes you to click on something. But for computer that is a long time (again depending on the task of course)… my point is that we need things like this. So that we don’t have to roll these things into each one of our scripts.The other thing I suggested was a way for whole scripts, or only certain steps get logged to the console. Then you can simply go about using your solution and then analyze what scripts or steps are the longest.

Reply

Tim Owen March 18, 2011 at 5:48 pm

As an alternative to using utility DatesOnStock and LAST flag fields in the transactions table, did you consider using a filtered relationship from the context of the goods table to the transactions table using a DateOnStock global date field in the goods table- i.e. the relationship being defined as goods::id = transactions::PN AND goods::DateOnStock ≥ transactions::created (assuming the created field in transactions in your example is an indexed date field representing the transaction date)? You could then loop through the records in the goods table (or, a sub-set: just the ones considered active, or whatever) and generate the required inventory values for the selected DateOnStock for each goods table record. In my own systems (in this case a point-of-sale ticketing system for film festivals) I also use an indexed current inventory (number) field in the inventory (goods) table, which is set by script/script trigger (i.e. incremented or decremented by the transactions/LineItems table quantity value) each time a transactions/LineItems table record is created, edited or deleted.

Reply

HOnza Koudelka March 18, 2011 at 10:16 pm

Daniel may be right. I cannot find an example where Get(FoundCount) would fail as a replacement for Count. Can you find any? Fabrice is trying to find one. Follow our twitter conversation if you’re interested in this specific trick: http://search.twitter.com/search?q=Get%28FoundCount%29Vince: Great suggestions! I can imagine if there was an efficient way to benchmark FM solutions everyone would be using it.Tim: Yes we considered that. But we need to report quantity of stock for each PN per stock/store. So we could either do it in the transactions table, or create a new table for each PN/StockID combination. BTW we use the field named “timestamp” isntead of “created” as we need to be able to create a transaction and set its effective timestamp to anything in the past.

Reply

HOnza Koudelka March 20, 2011 at 11:37 am

Confirmed by @FabriceN on twitter – it seems safe to use unstored Get(FoundCount) in the child record as a faster replacement for Count in the parent record if the purpose is to count the total number of related records.

Reply

Sarjono April 6, 2011 at 6:17 am

Even if we have minimised the use of unstored calculations in our relational db, running that same Filemaker solution on FM Pro Advanced or FM Server Advanced will not yield any speed difference. The server uses a Xeon multicore processor ( a total of 8 cores). Upon closer inspection on core usage, while the FM solution is heavily processing away, I can clearly see that… Filemaker Server only uses one core for its calculations! No wonder it runs as slow as the FMPA version on the laptop!Maybe we should write a plugin that forces FM Server to use all cores available? At present it is aware there are more than one core available, but it only uses one. Is that considered stupid, or a key product feature?

Reply

HOnza Koudelka April 6, 2011 at 7:58 am

Sarjono, as far as I know FMS 11 can use multiple cores but only uses one core per thread. I think that every server-side scripts runs in a single thread and definitely any single calculation runs in a single thread.But when multiple users perform find at the same time these searches should run in separate threads and should be able to utilize the additional CPU cores.

Reply

Jack Rodgers May 28, 2011 at 5:26 pm

Long ago I decided to use concrete numbers in my dbs. Concrete being an actual number and not a calculation. I feel it is faster to do the computations in the time between the click on the button and the blink of an eye that using 200 calculated fields that must be updated before the lady sings.

Think of how much faster Filemaker would sum 50,000 fields if they contained ‘real’ numbers instead of calculations relying on other calculations relying on…

Reply

HOnza May 29, 2011 at 11:57 pm

Jack, isn’t this what stored calculations are for?

Stored calc gets evaluated as soon as the record’s fields it depends on change, and the result gets stored in the db just like a regular number field.

HOnza June 7, 2011 at 1:01 pm

Jack Rodgers has sent me some followup points by e-mail. I believe they are worth sharing/discussing, so I am sharing Jack’s comments here below (with his permission, of course) and I look forward to your opinions…

A few points to ponder:

Modern databases gain speed by using virtual ram and ram to store data that they then access rather than the db. This can be a plus until it takes longer to access the virtual db than the real db.

Huge databases are going to take up gigabytes of ram for this. At some point you may run out of free and need to clean house.

Years ago I was using Word 4 on a Mac with minimum ram. One day my computer ground to a halt and the word processor functioned one character at a time, slowly. I eventualy traced it to Word’s maintaining multiple copies, etc, so it could revert to edit x. My copy was trying to work in a 20 byte (or so) work space! After I saved the document and turned off that feature, it ran smoothly.

The old fashioned loop with edits avoids the massive virtual memory requirements and the flushes, etc. It also allows the creation of a REAL TRANSACTION method that replace field does not. With the loop you can timestamp a modified record. If you get a locked record alert, you can find the records that were not timestamped and deal with them later.

Relying on Filemaker stuff for an update (portals, gttr, replace field, etc) is a guaranteed slow down.

For instance if you have 20 fields to update, some of those might get skipped due to record locking which is another issue, Filemaker has to find and process and save every record in the set 20 times. Where as if you use a loop go to record set field 20 times, it only has to find the record once and save it once. There is part of the big savings in time. Not to mention all of the TOs that might have to be loaded…

One other point, also to be considered is what gets loaded when you load a record. I try to avoid using calculated fields and use scripts to do the work out of an old habit from dBase and 4D. This makes the result a normal field and at the instant of the calculation finishes the job. It also required an update script if something changed elsewhere. Not that much of a problem. Anyway, calculated fields may cause long chains of updates to calculated fields which take time… The result which may have taken only the blink of an eye to calculated when the user entered the data may take considerably more when large numbers of records are being processed for a report or eom statement. A loop with set fields and no calculated fields only loads and evaluates the records in the script not every record in every TO linked to calculated fields.

A Filemaker engineer should provide more reliable insights to this question and asking Filemaker for a comment might prove interesting.

Reply

Daniel Wood June 7, 2011 at 2:36 pm

I find the idea of running scripts to achieve calculations that are then set into “normal concrete” fields a bit odd to be honest. You are going to run into record locking issues in this situation just like any other situataion where another user is modifying a record.

The whole idea behind a stored calculation field, as HOnza pointed out, is that the change to a stored calculations value is instant, and record locking will not interfere in this situation. FM gives us calculations so why not use them. If it is a question of speed then I find it hard to believe that a script setting a field is faster than a stored calculation. It is only unstored calculations that need to go off and potentially fetch other records for the purpose of evaluating it – that is why they are unstored because you can’t store it if it references related records among other things. A stored calc is no different to a normal field in terms of speed, because a) it is not calculated in real time, that has already taken place… it’s stored ! and b) FM sends the client the entire record contents for the record being viewed regardless of whether you simply want a number field or a stored calc field, so the speed there is no different.

Maybe I’m missing the point.

Reply

Perren Smith June 7, 2011 at 10:33 pm

The idea of stored derived data is just one “speed boost” method to the age old problem of “no stored calculations referencing related fields” constraint. I never really see a way around that at a low level in the FM engine, so in some instances for ease of use to the user (native find mode from perspectives not normally achievable) or for compensation to an inadequate data model this is the way developers / clients choose.

Personally, not my favorite either…I can get plenty fast with normalization techniques and some other UI tricks to net the same result.

–Perren

Josh Ormond June 9, 2011 at 12:30 pm

I see valid points with both sides of this one. However, there are ways to prevent the memory hogging that can happen with FileMaker’s temp cache file. As well as ways to use stored calcs that don’t need to aggregate calcs on enormous data sets. Although, stored calcs don’t help in something like an sales/inventory system because the inventory amount is in a related table.

There are definitely times when you would use one approach over another. But that is why developers get paid so much, right? We need to know what FileMaker is doing, because not only of the knowledge we have accumulated about good design, but also because we have tested it for the current solution. And we have hard data to say, “For this solution, technique A is more appropriate than technique B”.

I like the idea of the transaction model, it creates safe data and allows you some additional level of control over what happens when an error occurs. And at the same time, so does using calculations (stored or not). There are benefits of the transactional model, but I’m not totally sold on it.

As for asking FileMaker, and an engineer about it, they have already spoken. The under the hood sessions at DevCon, as well as workshops like Skeleton Key’s “Developing for Maximum WAN Performance” give you insight from the FileMaker team about exactly what FileMaker is doing. And in the end, as long as you know how to minimized the data transfer and maximize the speed of an operation…the technique is often negligible. (Noting the fact that this doesn’t account for designing to allow for scalability and normalization…but again that’s why we get paid the big bucks).

Reply

Roland September 15, 2011 at 3:48 pm

I like the message of the page here. :)

I probably missed some details, but could it be that you’re coming at it from the wrong side? Seems all the criteria is on the child side. Could an initial find be done on the child context and then just got to the related parent(s) by foreign key to primary key?

Back to the ‘stupid’ way, also why the sort on the relationship? If the records are created in chronological order, could the Last() function to see the last one be used? That sort will kill you. The Last() function on a related field is quick.

Reply

HOnza September 15, 2011 at 11:34 pm

Without the sort you cannot add/edit past records when you find a mistake you need to correct.

I am not sure what you mean by your parent/child mention. We need to do the search in transactions (child) because there is no sense of time in goods (parent) and we always have been doing that.

Peter August 21, 2015 at 9:13 pm

Bet the same search in MySQL would take approx .0005 seconds. If this ‘tutorial’ teaches anything – it’s to avoid FileMaker.

Reply

HOnza October 22, 2016 at 2:18 am

This is misleading. First, I seriously doubt that with the same data getting the same result would take less than a second even in MySQL, and second, MySQL is a different tool for different purpose. There are certainly tasks for which MySQL is more appropriate than FileMaker, but there are also tasks for which FileMaker is more appropriate than MySQL. Use the right tool for the right for the right task.

Leave a Comment