Add some benchmarks for KFileItemModel, improve worst-case performance of KFileItemModel::insertItems() and KFileItemModel::removeItems()
Review Request #108540 - Created Jan. 22, 2013 and submitted
I think it might be a good idea to make sure that Dolphin behaves better when handling many files and started looking at places in the code which have sub-optimal performance. There are probably quite a few of them - I'll start with KFileItemModel. Of course, everyone is invited to join the fun and look at other classes :-) My first target is inserting and removing items. The patch does two things: 1. Add a kfileitemmodelbenchmark that tests two things: inserting many files in two equally sized batches, and removing one half from a set of many files. The half that is added/removed can be the first or second half in a sorted list of files, or all odd files. I've made the benchmark a regular executable, rather than a unit test. The reasoning is that unit tests are often run automatically, and their results are only looked at if something goes wrong, so it looks like running benchmarks with every "make test" run might waste CPU cycles. Nonetheless, I do test the model state for correctness, I think that this can never hurt. When passing the -xml option to the benchmark, the XML results can be transformed to nice graphs using qtestlib-tools: http://blog.qt.digia.com/blog/2008/12/05/qtestlib-now-with-nice-graphs-pointing-upwards/ I'll attach screenshots of "before" and "after" benchmark results. The values on the y-axis are milliseconds. 2. The problem with KFileItemModel::insertItems() and KFileItemModel::removeItems() is that they insert/remove the items one by one in the QList m_itemData. This was easy to implement, but inserting/removing is O(N) for QList, and if we do that N times, then we can get O(N^2) complexity, which is not good. My test results indeed show that this happens in some cases. The only cases that work well are "add many items at the back" and "remove many items from the back". To resolve this, I propose to first only calculate the item ranges that are added/removed, and do the actual inserting/removing only when this is finished, such that the final position for each item is known, and every item has to be moved only once. This does make the code a bit more complicated, but it ensures that adding/removing N items is O(N). One might be able to do it with less code by replacing the int-based indices in the functions by iterators and use std::copy, std::copy_backward to do some of the work, but I didn't want to change too many things at the same time. Any thoughts? If there is agreement that this makes sense, I'll start to look at other places in those functions and other parts KFileItemModel. I have some ideas already - for example, I think that we can make sure that removing items is not more expensive than inserting them, which is currently the case (see benchmark results).
Yes, unit tests still work fine. Benchmark results attached (obtained with a release build).
Hi Frank, I guess i know who you mean by "everyone is invited to join the fun and look at other classes" ;) I try to limit myself to kdelibs in this case where Dolphin would (or could) benefit automatically. I do have a few suggestions for you. 1. You talk about std::copy, but why do you want to copy anyway? Why don't you just use std::move? (C++11) 2. Why do you insert based on id? Why don't you just append the new list followed by a resort of the entire list? 3. If the new item list is bigger then the current item list them perhaps it might be worth to consider a list swap (QList::swap): - store the current list in some temp value - swap the new list to the current list - append the old items to the new list I hope these suggestions make sense :) Also very nice results in the graphs, good job! I'm glad i inspired you to improve dolphin for massive folders ^_- Cheers, Mark
Review request changed
OK, I decided to focus on itemsRemove() first. It just gets too confusing when changing multiple things at once and evaluating several options for all of them ;-) So this now improves itemsRemoved by: a) Not removing items one by one, but doing it in a way that minimizes the number of moves to prevent O(N^2) worst-case complexity. b) Not sorting the removed items using the potentially extremely slow KFileItemModel::lessThan. We can get the indexes of the removed items very easily from the hash m_items, and sorting ints is a lot faster. c) Preventing repeated rehashing of m_items when removing the deleted URLs by replacing remove() by erase(). Moreover, I tried to give the entire function a bit more structure and factored out a function that transforms a sorted list of indexes into a KItemRangeList - might be useful also in other contexts. I've added a new data set 'all' to the plots, which is the time required to add all items to the model. This is included in all 'remove half the items' measurements as well, because I have to first add all items in the benchmark loop, which might be executed multiple times. Any comments? If noone sees a problem or obvious further improvement options in that function, I'll push that in a few days.
Revision 2 (+297 -51)