Open Bug 1671932 Opened 5 years ago Updated 15 days ago

Asynchronous non-blocking temporary storage initialization tolerating broken origins

Categories

(Core :: Storage: Quota Manager, task, P1)

task

Tracking

(ASSIGNED bug which should be worked on in the current release/iteration)

ASSIGNED

People

(Reporter: janv, Assigned: janv)

References

(Depends on 4 open bugs, Blocks 13 open bugs)

Details

Attachments

(1 obsolete file)

We are currently facing two major problems:

  1. Temporary storage initialization errors
  2. Slow temporary storage initialization in some cases

The temporary storage initialization must be currently successfully finished before a quota client can access files on disk (doesn't apply for the persistent repository).

If we could make the initialization asynchronous then we would mitigate the two major problems in a big way.

I'm still not 100% sure if it is feasible, but so far it looks good.

We are now fully working on this, it's our top priority.

Priority: P2 → P1
Depends on: 1680031
Depends on: 1689293

I think the description and summary should make one thing explicit that is currently implicit only: At the moment, storage initialization succeeds or fails atomically for all quota clients and origins. This bug intends to be change that so the success/failure is partial (per origin?). This is an important point, because this could be implemented (addressing problem number 1) independently from making initialization asynchronous (which addresses problem number 2). However, this bug specifically suggests to make both changes in one go.

That's true (partially), but it can't be easily implemented independently. I'll try to provide more details about that since I'm focusing on this more right now.

The main point here is that we currently allow only 100% accurate quota management. We want to change it so we will allow "inaccurate" usage tracking for the time the temporary storage is being initialized. So quota clients will be able to use storage even if the initialization is not finished yet. When the initialization finishes and we see that more data has been written than it would be allowed with synchronous initialization, we will evict some origins. After that we will have 100% accurate quota management/usage tracking again.
Once we do necessary changes for initializing origins asynchronously, then they naturally won't be able to break entire temporary storage initialization. The quota management will stay in "not 100% accurate" mode since we couldn't get exact usage for broken origins, The broken origins will stay uninitialized with some files on disk and they won't be included in overall usage calculations. The fact that we leave some extra files on disk which are not included in the usage calculations shouldn't be a big problem. We already had to make an exception for LSNG which tracks only logical size of the database. So the total physical size of all files doesn't have to match the usage we internally use for quota checks. We only allow to use 50% of free disk space anyway, so there should be a lot of space in reserve.

Summary: Asynchronous non-blocking temporary storage initialization → Asynchronous non-blocking temporary storage initialization tolerating broken origins
Depends on: 1690025

As part of this, the comment in QuotaManager::CollectOriginsForEviction should be addressed, see https://phabricator.services.mozilla.com/D101182#inline-580859.

Depends on: 1686031
Depends on: 1749504
See Also: → 1741865
Blocks: 1778472
Depends on: 1781204
Severity: S3 → S2
Priority: P1 → P2
Duplicate of this bug: 1741865

As I wrote on 1741865 if you need to profile or to test some firefox patched let me know as I have that issue only on my workstation since 2 years...

Blocks: 1804823
Blocks: 1792286
Blocks: 1817580
Depends on: 1839417
Depends on: 1840545
Depends on: 1840770
Blocks: 1848692
Duplicate of this bug: 1848692

One of the goals of bug 1671932 is to call EnsureTemporaryStorageIsInitialized
only from InitTemporaryStorageOp. Calling from other places including quota
clients will be disallowed by changing the method to a private method. The
private nature of the method should be emphasized by adding the Internal
suffix.

Changes done in this patch:

  • IsTemporaryStorageInitialized renamed to
    IsTemporaryStorageInitializedInternal
  • EnsureTemporaryStorageIsInitialized renamed to
    EnsureTemporaryStorageIsInitializedInternal

Depends on D186781

Depends on: 1855142
Depends on: 1808294

Comment on attachment 9353390 [details]
Bug 1671932 - Rename EnsureTemporaryStorageIsInitialized to EnsureTemporaryStorageIsInitializedInternal; r=#dom-storage

Revision D188332 was moved to bug 1808294. Setting attachment 9353390 [details] to obsolete.

Attachment #9353390 - Attachment is obsolete: true

Tracked by Reddit: https://old.reddit.com/r/firefox/comments/17u92cj/firefox_first_startup_high_disk_usage_io_wait/

This affects user experience and 2023 user experience is the key.

We are working really hard on this bug.

Depends on: 1866217
Depends on: 1866402
Priority: P2 → P1
Depends on: 1867997
Depends on: 1875995
Depends on: 1883353
Blocks: 1265504
Blocks: 1899015
Blocks: 1749007
Depends on: 1903186
No longer depends on: 1883353
Blocks: 1903530
No longer blocks: 1903530
Blocks: 1903530
Depends on: 1904562
Depends on: 1904674
No longer blocks: 1741865
No longer duplicate of this bug: 1741865
No longer blocks: 1848692
No longer duplicate of this bug: 1848692
See Also: 1741865
Depends on: 1904941
Depends on: 1913561
Depends on: 1913679
Depends on: 1914609
Depends on: 1919788
Depends on: 1920456
Depends on: 1920487
Depends on: 1923056
Depends on: 1924658

I noticed that there is a new loading tab icon. Does it have any special meaning?
https://i.imgur.com/IprcCyp.png
The left one is the old one. An animated dot.
The right one is the new one. Looks like an hourglass.

(In reply to eight04 from comment #14)

I noticed that there is a new loading tab icon. Does it have any special meaning?

The tab loading animation now stops after 45s to avoid wasting CPU time / energy for pages that never finish loading. This was implemented in bug 1812019. So no special meaning, other than "this page has been in a loading state for more than 45s".

Depends on: 1925782
Depends on: 1925813
Depends on: 1925832
Depends on: 1911030
Depends on: 1928028
Depends on: 1927410
Depends on: 1928939
Depends on: 1927723
Depends on: 1928092
Depends on: 1929190
Depends on: 1925418
Depends on: 1927259
Depends on: 1929840
Depends on: 1931513
Depends on: 1931996
Depends on: 1932101
Depends on: 1932102
No longer depends on: 1927259
Depends on: 1933799
Depends on: 1942003
Depends on: 1942781
Depends on: 1950304
Depends on: 1950564
No longer depends on: 1933799
See Also: → 1933799
Blocks: 1933799
See Also: 1933799
No longer blocks: 1903530

I'm pretty sure this bug the cause of long-standing issues on startup (like... many years... at least 4 years, going back to at least v94), where the browser window opens, and even the first page loads, but you can't click anything for a long time. Like... 40+ seconds?

This affects every Firefox on every platform that I run it on, including several versions of Linux, both 32- and 64-bit, and on Windows. The common denominator is that all these systems have magnetic hard drives. I'm guessing that if you have an SSD, the problem is somewhat hidden from you, because I/O is faster. And what Firefox dev doesn't have an SSD?

But Firefox, on startup, wants to do a LOT of I/O, and what it's doing blocks all other activities, including page loads. You can open tabs, click menus, etc., but you can't interact with web pages.

I've actually become accustomed to walking away after starting Firefox, and coming back later. I've been doing this for years.

Oh... and also, in some earlier versions, doing something in a Private Window seems to bypass the problem. Only the non-private window is blocked by the startup IO. But in v136, which is the latest version that I'm running, this Private Window trick no longer works.

My current mitigation strategy, in v136, is to set Firefox to clear "Temporary cached files and pages" on exit. This seems to mostly fix the problem. However, whenever Firefox crashes, this clearing does not happen, and then the bad behavior returns on the next startup.

Currently, in v136, this heavy startup disk activity is all due to "QuotaManager IO"

Long ago, I logged all the files that were being touched at startup, and it seemed to be.... every single file in the cache?

I'm just chiming in here to applaud the efforts to work hard on this bug. It's a big one, and well worth fixing.

Thanks a lot for this constructive and detailed comment, it really helps to hear from long-time users who have been observing these patterns across versions and platforms. It’s true that this issue has been around for a long time, partly because it’s been difficult to allocate enough resources to tackle it properly. Recently, though, we’ve started addressing it more directly through focus projects like the L2 Quota Info cache, see bug 1953860.

The L2 cache is designed as a middle ground between the very fast L1 Quota Info cache and full directory scans, and it should benefit Firefox Desktop as well, especially in cases where build IDs change frequently (such as on the Nightly channel) or after unexpected crashes, where we currently lose the fastest L1 cache and fall back to much slower full scans.

Here’s a quick status update for this bug: the work on asynchronous storage initialization, also known as incremental origin initialization, is mostly done. If you're adventurous enough, you can actually try it out by flipping the preference dom.quotaManager.temporaryStorage.incrementalOriginInitialization, but only experimentally and with a backed-up profile. More details can be found in comment 23 on bug 1867997.

Once the L2 Quota Info cache work is finished, we’ll get back to incremental origin initialization and continue moving it forward.

I appreciate the response! Good luck on your quest.

Blocks: 2028506
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: