Improve the speed you access and update data
Making your data updates and retrievals fast is important. However, the patience it takes to identify optimization options often means it gets put off, not done at all, or it’s just assumed it is as good as it can get. Using an outside source for this is a good idea because they have no preconceived notions about the data and they are not bogged down with their normal day to day tasks. Many companies just rely on their programmers to deal with optimizations but having a database expert dedicated to the task can often bring in a new perspective and a better focus.
If your dataset is large or if your online database is performing many important transactions to support its primary function it might not be possible to improve the performance for reporting. You might need to move the data to an offline database (data warehouse) which is not handling all those transactions. This separate database server would run on a different physical server and be optimized for reporting as well as house other related data. An often overlooked benefit of an offline reporting database is having stable information that’s not constantly changing as it is in the online database; the updates from the live database are done on a schedule and people can analyze the data and make tweaks without everything changing the next time they refresh.
In some cases it’s easy to determine that an index or two needs added; sometimes it’s not as easy. Removing an index or suspending updates on an index while inserts are being done is more challenging to identify. In other instances you might need to look at where the slowdown is located. Should the optimization should be done in the code or within the database engine?
Where should you optimize?
I recently had a project where lots of formulas were stored in a database table. Initially, I wrote code that would step through each formula, get data from other tables to populate the formulas with values, perform the calculation, and then update another database table with the results. This step by step approach in the code was very useful for debugging because many off the calculations depended on previous calculation and the order was critical. However, the execution took over an hour. I was executing too many individual database update statements. After confirming my logic was correct, I worked on the performance. I created code to dynamically create very large SQL statements rather than many smaller statements. Database engines are highly optimized and with my new SQL and the right indexes the execution time for the entire process dropped from around 70 minutes to only a few minutes to generate the same few hundred thousand records.
The key take away here is to not block yourself into a particular solution; instead think about the process as a whole. It can also be a cost effective solution to have an outside consultant come in and do a short-term contract to help with your optimization.