Thursday 12 May 2011 — This is close to 14 years old. Be careful.
One project I’m working on these days is a large test suite for a Django project, in particular, trying to make it go faster. In one subset of 16 tests, total execution time is about 250 seconds! The tests clearly spend a lot of time in the database: 50% of the time is creating the test db, and another 30% is installing test fixtures.
A classic first step is to run the tests from an in-memory SQLite database. In this case, it’s a Postgres database using GIS features, so changing engines isn’t a simple settings change. Postgres doesn’t have an in-memory feature, but I thought, why not force the database into memory without Postgres knowing, simply by creating the database in a ramdisk?
On Linux, this is a simple matter of creating the database in /dev/shm instead of a real disk-backed filesystem. And good thing it was simple, too, because it didn’t help at all: the test times were completely indistinguishable between the two filesystems.
Which in retrospect makes perfect sense. Disk filesystems employ aggressive caching to keep data in memory in the first place, so it was unlikely that my simple trick was going to change things significantly. The test database is small enough to fit into RAM anyway, and for the tests, Postgres was already set to disable fsync.
The remaining mystery: why is SQLite in-memory faster than other databases on disk? If it is about the memory, then it should be possible to make other databases faster by forcing them into memory. And if isn’t about the memory, then it’s just that SQLite is faster no matter what (because it’s in-process, for example) and using a disk-based SQLite should be faster than your real database also.
Comments
During the trying of speed-up some operations we tried to use in-memory database instead of db stored on the disk. The speed improvement was evident (~20-30%) only when executing lot of separated queries out of the transaction.
When INSERTs/UPDATEs/... are executed in the transaction, speed results are very similar (+/- 5%) to in-memory DB.
So the reason for the very good SQLite performance is probably due to a very good optimization within core ;-)
Just in case anyone is wondering if it is a linux-only feature, you can do that in any of the BSD OS too, for example, in FreeBSD you can create memory-based filesystems easily using mdconfig or mdmfs:
http://www.freebsd.org/doc/handbook/disks-virtual.html#DISKS-MD-FREEBSD5
@Evgeny: Your points are about how SQLite's in-process embedded model allows it to be faster. True, but doesn't explain why SQLite in memory is faster than SQLite on disk. Unless SQLite has *worse* disk optimization than other engines. And it's hard to see how skipping SQL optimization can make things faster...
@Sam: the next step is to investigate faster ways to get content into Postgres, yes.
http://www.sqlite.org/faq.html#q19
One other exception to this is Windows XP which uses a 10MB file system cache by default no matter how much RAM you have. A registry edit can fix this.
@Floris: You can disable the fsync calls in SQLite but only if you don't care about durability. See the journal_mode pragma at http://www.sqlite.org/pragma.html#pragma_journal_mode
Add a comment: