sqlite DBファイルを更新をするとデータ構造が壊れて しまいました。 ちなみに sqlite側のNFSでの運用に対するスタンスは以下の通り。 Locking mechanism might not work correctly if the database file is kept on an NFS filesystem. This is because fcntl() file locking is broken on many NFS implementations. You should avoid putting SQLite database files on NFS if multiple processes might try to access the file at the same time. ⇒ NFSではロック機構が正しく動かない可能性があります。これは fcntl() のファイルロックが多くのNFSの実装で壊 れているからです。もし複数のプロセスから同時にデータベースファイルへアクセスするなら、NFSにデータベースフ ァイルを配置するのは避けるべきです。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 5
0 lrwx------ 1 jun jun 64 Oct 4 11:34 0 -> /dev/pts/25 lrwx------ 1 jun jun 64 Oct 4 11:34 1 -> /dev/pts/25 lrwx------ 1 jun jun 64 Oct 4 11:34 2 -> /dev/pts/25 lrwx------ 1 jun jun 64 Oct 4 11:35 3 -> /home/jun/works/sqlite/test.db fd: 3 --PID:17833 sqlite3-- total 0 lrwx------ 1 jun jun 64 Oct 4 14:00 0 -> /dev/pts/19 lrwx------ 1 jun jun 64 Oct 4 14:00 1 -> /dev/pts/19 lrwx------ 1 jun jun 64 Oct 4 14:00 2 -> /dev/pts/19 lrwx------ 1 jun jun 64 Oct 4 14:00 3 -> /home/jun/works/sqlite/test.db fd: 3 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 12
to use the NLM sideband protocol to lock files on the server. If neither op‐ tion is specified (or if lock is specified), NLM locking is used for this mount point. When using the nolock option, applications can lock files, but such locks provide exclusion only against other applications running on the same client. Remote applications are not affected by these locks. 「nolock でもロック可能だけど同じNFSクライアント上で動作するアプリケーションにのみ作用する」 とのことで す。 これで問題に対する直接の原因が分かりました。 sqlite が NFS 上のファイルを fcntl() でロックすることは可能だけど同一のNFSクライアント(同サーバ)のみに作用 して、別サーバからは同時にロック取得できてしまうことが分かりました。 これでは書き込み要求が競合するとファイルが破損するのは必然です。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 23
cache file attributes. If neither option is specified (or if ac is specified), the client caches file attributes. To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server. In addition to preventing the client from caching file attributes, the noac option forces ap‐ plication writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes. Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file lock‐ ing is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discus‐ sion of these trade-offs. The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0. しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 30
NFS client caches attributes of a regular file before it requests fresh attribute information from a server. If this option is not specified, the NFS client uses a 3-second minimum. See the DATA AND METADATA COHERENCE section for a full discussion of attribute caching. acregmax=n The maximum time (in seconds) that the NFS client caches attributes of a regular file before it requests fresh attribute information from a server. If this option is not specified, the NFS client uses a 60-second maximum. See the DATA AND METADATA COHERENCE section for a full discussion of attribute caching. acdirmin=n The minimum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh at‐ tribute information from a server. If this option is not specified, the NFS client uses a 30-second minimum. See the DATA AND METADATA COHERENCE section for a full discussion of attribute caching. acdirmax=n The maximum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh at‐ tribute information from a server. If this option is not specified, the NFS client uses a 60-second maximum. See the DATA AND METADATA COHERENCE section for a full discussion of attribute caching. actimeo=n Using actimeo sets all of acregmin, acregmax, acdirmin, and acdirmax to the same value. If this option is not specified, the NFS client uses the defaults for each of these options listed above. しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 31
done asynchronously. (See also the sync option.) sync All I/O to the filesystem should be done synchronously. In the case of media with a limited number of write cycles (e.g. some flash drives), sync may cause life-cycle shortening. nfs 固有のマウントオプションと mount 全体の汎用オプションで参照するドキュメントが違います。 sync, async は mount 側のドキュメントに記載があります。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 32
1813 - NFS Version 3 Protocol If stable is FILE_SYNC, the server must commit the data written plus all file system metadata to stable storage before returning results. NFSv3 の仕様を規定した RFC1813 には FILE_SYNC フラグが付いていると応答前にディスクへの永続化を保証すること が記載されています。 NFS のように Linux やら Solaris やら AIX などさまざまな環境で実装された機能の仕様を調べるには RFC が役に立ちま す。 (注) RFC は策定された仕様というだけなので各プラットフォームで RFC 通り実装されている保証はありません。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 34
file systems provide perfect cache coherence among their clients. Perfect cache coherence among disparate NFS clients is expensive to achieve, especially on wide area networks. As such, NFS settles for weaker cache coherence that satisfies the requirements of most file sharing types. Close-to-open cache consistency Typically file sharing is completely sequential. First client A opens a file, writes something to it, then closes it. Then client B opens the same file, and reads the changes. When an application opens a file stored on an NFS version 3 server, the NFS client checks that the file exists on the server and is permitted to the opener by sending a GETATTR or ACCESS request. The NFS client sends these requests regardless of the freshness of the file's cached attributes. When the application closes the file, the NFS client writes back any pending changes to the file so that the next opener can view the changes. This also gives the NFS client an opportunity to report write errors to the application via the return code from close(2). The behavior of checking at open time and flushing at close time is referred to as close-to-open cache consistency, or CTO. It can be dis‐ abled for an entire mount point using the nocto mount option. Weak cache consistency There are still opportunities for a client's data cache to contain stale data. The NFS version 3 protocol introduced "weak cache consistency" (also known as WCC) which provides a way of efficiently checking a file's attributes before and after a single request. This allows a client to help identify changes that could have been made by other clients. When a client is using many concurrent operations that update the same file at the same time (for example, during asynchronous write behind), it is still difficult to tell whether it was that client's updates or some other client's updates that altered the file. Attribute caching Use the noac mount option to achieve attribute cache coherence among multiple clients. Almost every file system operation checks file attri‐ bute information. The client keeps this information cached for a period of time to reduce network and server load. When noac is in effect, a client's file attribute cache is disabled, so each operation that needs to check a file's attributes is forced to go back to the server. This permits a client to see changes to a file very quickly, at the cost of many extra network operations. Be careful not to confuse the noac option with "no data caching." The noac mount option prevents the client from caching file metadata, but there are still races that may result in data cache incoherence between client and server. The NFS protocol is not designed to support true cluster file system cache coherence without some type of application serialization. If abso‐ lute cache coherence among clients is required, applications should use file locking. Alternatively, applications can also open their files with the O_DIRECT flag to disable data caching entirely. しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 40
provide perfect cache coherence among their clients. Perfect cache coherence among disparate NFS clients is expensive to achieve, especially on wide area networks. As such, NFS settles for weaker cache coherence that satisfies the requirements of most file sharing types. ドキュメントから NFS は 大抵の用途に適合する弱いキャッシュの一貫性 を採用しているとのことが分かりました。 DATA AND METADATA COHERENCE とは DATA ⇒ ファイル内容のキャッシュ一貫性 METADATA ⇒ ファイルやディレクトリの属性情報のキャッシュ一貫性 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 41
Linux implements close-to-open cache consistency by comparing the results of a GETATTR operation done just after the file is closed to the results of a GETATTR operation done when the file is next opened. If the results are the same, the client will assume its data cache is still valid; otherwise, the cache is purged. 大抵の利用用途がクライアントAがファイルを開いて書き込んで閉じ、次にクライアントBが開いて書き込んで閉じ るというシーケンシャルな流れであるという想定 前回ファイルを閉じた際に取得したタイムスタンプと、次回開く際のタイムスタンプを比較して違いがあればファ イル内容のキャッシュを破棄する 逆に言うとファイルを開き直さないとファイル属性の強制取得は行われないのでキャッシュは破棄されません。 ドキュメントに出てくる GETATTR は RFC で定義される NFS サーバからファイル・ディレクトリの情報を取得するプロ シージャです。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 42
data cache to contain stale data. The NFS version 3 protocol introduced "weak cache consistency" (also known as WCC) which provides a way of efficiently checking a file's attributes before and after a single request. This allows a client to help identify changes that could have been made by other clients. When a client is using many concurrent operations that update the same file at the same time (for example, during asynchronous write behind), it is still difficult to tell whether it was that client's updates or some other client's updates that altered the file. Close-to-open だけではファイルを開いた後に更新されたデータを取得できないので Weak cache consistency という機 能もあるようです。 各オペレーションの前にファイル属性を取得(GETATTR)して認識していない属性情報の更新があれば他のクライアント から更新されたと判断してキャッシュを破棄するという動作のようです。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 43
'rb') return hashlib.sha256(fp.read()).hexdigest() fp.close() for i in range(1000): print('{:04}: {}'.format(i, sha256hash2())) time.sleep(0.1) しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 53
advisory locks are broken by design. ANSI STD 1003.1 (1996) ** section 6.5.2.2 lines 483 through 490 specify that when a process ** sets or clears a lock, that operation overrides any prior locks set ... ** ** This means that we cannot use POSIX locks to synchronize file access ** among competing threads of the same process. POSIX locks will work fine ** to synchronize access for threads in separate processes, but not ** threads within the same process. ** ... ** ** But wait: there are yet more problems with POSIX advisory locks. ** ... sqliteのソースコードの中にはPOSIXのロック機構に対する愚痴が90行近くコメントで書かれていたりします。 例えば同じファイルを2回開いた状態で片方をクローズすると、そのファイルに対するロックを失います。それ故に sqlite はファイルの開閉で少し特殊な処理が必要になっています。 しくじり先生 「NFS+sqliteで苦労した話から学ぶ、問題解決の考え方」 55