Only initialize the hash m_items in KFileItemModel if it is needed
Review Request #115432 - Created Feb. 2, 2014 and submitted
KFileItemModel has a member QHash<KUrl, int> m_items which makes it possible to find the index of an item if the URL is known. m_items is updated every time an item is added to/removed from the current directory, or if the sort order changes. This is convenient in many situations, but sometimes, this hash also causes problems: 1. In some corner cases, the model can get into a state where the indexes in m_items are not correct. There were, e.g., https://bugs.kde.org/show_bug.cgi?id=324371 https://bugs.kde.org/show_bug.cgi?id=325359 which I could reproduce by doing a search in Details View and expanding folders, such that the same URL appeared twice in the view (but only once in m_items). But I think that there are other ways to make the relationship between m_items and the model contents inconsistent. A crash that looks like it is most likely caused by such problems is https://bugs.kde.org/show_bug.cgi?id=329494 2. Initializing m_items costs CPU cycles and memory, which is wasteful if m_items is not used at all. Therefore, I propose to make the following changes: (a) Do not require that all URLs are in m_items any more. Instead, the new requirement is: if there are N URLs in m_items, it is guaranteed that they are the first N URLs in the model. (b) Replace most calls of m_items.value(url) by index(url) in KFileItemModel. Note that some local variables 'index' had to be renamed for this to work. (c) In KFileItemModel::index(const KUrl&), check if the URL is in m_items already. If that is not the case, add 1000 URLs to the hash, and check again. Repeat until either the URL is found, or all URLs are in the hash (-1 is returned in the latter case). Note that we always know where to continue adding URLs: if m_items.count() is N, then "m_itemData.at(i)->item.url()" is the first URL that is not in m_items. BTW, my first version of index(const KUrl&) added URLs to the hash one by one, and checked every time if the next URL is the right one by comparing the URLs with ==. However, this was slow and required a lot more memory because comparing URLs with the == operator triggers a parsing of the URL, which needs memory and CPU cycles. I don't know if this is still the case in Qt 5, but in any case, adding new URLs in larger batches and then checking if the hash contains the searched URL should have a lower amortized cost in any case. (d) In insertItems() and removeItems(), clear the hash m_items. It will be re-populated the next time index() is called. Note that Dolphin < 4.11 also cleared the entire hash in these cases (it re-created the entire hash immediately though). In contrast to that, 4.11 and 4.12 only update the URLs/indexes starting from the first added/removed item. This can be problematic though - in bug 329494, we get a crash when trying to remove one of the removed URLs from m_items.
I could not find any regressions. Even if I could never reproduce the crash in bug 329494, I'm confident that the crash is fixed because the line where it tries to access the URL of a removed KFileItem and then crashes is removed (and there is no other access to the KFileItem in that function). Memory usage when loading a directory with 100,000 files is reduced by about 5 MiB in all view modes (from about 150 to 145) if previews are disabled. (If previews are enabled, KFileItemModelRolesUpdater calls the index(KUrl&) method, which fills m_items, and memory usage is as before). The time for loading a directory with 100,000 files is reduced by about 200 ms on my system (to about 4.3 seconds in Icons View, 3.7 in Compact and 2.8 in Details).
Hi Frank, You're adding quite a bit of complexity. Is it an option to just remove m_items and change the storage of m_itemData? That is already one big list with all data in it. If you change that to a QHash<KUrl, KFileItem> and drop m_items you reduce quite a bit it seems. It will certainly be a bit bigger in memory, but you get rid of m_data completely so there might be no tradeoff memory wise (right?). In terms of speed.. I don't know which one will be faster. I'm guessing a QList wins there.