In my continued (series of failures of an) attempt to make SQLite a local-first peerable database, I'm peeking at the sessions extension and wanted to record changes, at least in a structured form. I got some sample code I thought would work over at https://git.jacky.wtf/me/sqlite-sync/src/commit/798540e1bb609d1b91f360c1f85f5e0cf61d7c67/src/lib.rs#L106-L154. I just had this working two days ago when I was more disconnected from life.

The hope is that if I can record database actions and fuse in something like https://crates.io/crates/rusqlite_migration, I wonder if then having a table of migration data being peered could help as well. Immediately, I'm like "what, auto-updates? nah" so I'll have to think on that more.

For now, I need to just make a list of changes to this database.

So, with a bit of hacking and pokey points, I got SQLite using a custom filesystem in Rust. It “works” in the sense that, from no files, it generates a database that can be used by the SQLite CLI. It has some slightly rough parts still (it seems like later updates aren't persisted and the memory implementation segfaults because of some silly range coding) but this is promising!

The next foray into my journey of SQLite is going to be learning more about the WAL format. I have a cursory understanding of it as I venture with it in Rust, but I'd like to learn more. Might even have to glance at the magic of Litestream for this adventure.

Implementing Property Searching for Micropub

One of the proposed extensions to Micropub that I found fascinating is an extension to querying for a post list. It'd allow one to find a list of posts in their Micropub installation with any sort of querying. Koype currently supports looking up the MF2 of channels, categories, and entries. However, something I really wanted is the fields mentioned by Grant, the ability to filter over the properties of entries. This was something that would allow me to check if I've already interacted with something in my proposed social reader. Implementing this was not easy, though. I had to make a virtual table in SQLite, scan the source MF2 of all the entries on disk that are queried for (which was all, most of the time, in local tests!) and add two custom functions to SQLite to properly look for the wanted values in the right places.

Implementing the Virtual Table for Querying

I've read that SQLite's documentation is good. I did not find it to be the case. It's mainly because this begins to get into the plumbing of SQLite, and I'm still learning my way around it. Likewise, I also tend to peek into the source code, but reading C is not something I have done in a long time—I most definitely need a refresher. Instead, I leaned on the example code provided by the Rust library I'm using to interact with SQLite. Reading that and leveraging the documentation for virtual tables helped me get to a working table implementation. The goal was to provide a mapping between the properties stored on disk for an entry and its corresponding ID. This enabled queries like the following:

  SELECT
  
entries.*
FROM entries
JOIN epv on epv.uid = entries.uid
WHERE
mf2_json_has_value_in_property(epv.properties, 'like-of', 'https://lobste.rs') IS TRUE
ORDER BY entries.published_at DESC
LIMIT 5

This lets me look up any post that has a link to Lobsters as a like. What's not described by the name is that this function checks if the value of the property is either equal to or begins with the provided string. This kind of query could be translated to something like the following:

  Host mp.jacky
  
POST /micropub
Accept: application/json
Content-Type: application/x-www-form-urlencoded

q=source&
limit=5&
order=desc&
property-like-of[]=https://lobste.rs

200 OK
Content-Type: application/json
{
"items": {...}
}

This kind of query I'd have to be careful with. If I'm looking up a URL, I'd probably want it to match against the authority value, but that'd require parsing every string to see if it's a valid URL and then doing that match. However, I now have a way to look up every link from Twitter that I've liked, replied to or engaged with. I have a ticket to see if I can allow for some sort of hint to allow for full-text searching of properties—that would allow me to search the contents of things I've posted, so I can check for deep links.

The concept of “virtual tables” aren't unique to SQLite (see Postgres's wiki or MariaDB's knowledge base for more info). However, what I'm doing should definitely make your nose turn up a bit—I'm reading any number of JSON files from disk (about 2,100 for my site at the time of writing, with about 5 – 15 new posts added to my site each day) each time a request is made to my homepage; making it to a DoS just by hitting Refresh in your browser a few times! I've added logic to short circuit the request on the site and applied an eager connection timeout in SQLite (I set it to five minutes before—don't ask) so it'll just return empty lists (at best) if it doesn't resolve it in time. I'd love ideas on caching or optimizing how I've done the lookup as well as even storing the information for entry properties. A bit of the conventional knowledge I have around these approachs are a bit moot since

See it live!

My website uses Koype as its CMS. As of v0.1.4, this has been available. It powers the feed of items at the bottom of my homepage that shows things I've been doing on GitHub from my site, as well as the things I've interacted with on Twitter. I'm hoping to tinker with this more overtime to see what other kind of queries I can build. I'm hoping more people implement such a query because it provides a cheap affordance (confirmation of pre-existing data) that can be helpful in social readers.

This shows a list of links I've liked on Aaron's website.

The astute computer scientist in you probably noticed that this kind of solution wouldn't work in constant time. A poor implementation could actually lock up my site (databases tend to)! I have logic for timing out database calls, though. I'm also curious about adding some benchmarks to see if my naive approach works faster than something like JMESPath. Frankly, I'd love to have something like JMESPath ship as an extension to SQLite's existing JSON methods.

Some other changes are that the indieweb Rust library supports capturing these properties (as well as the exists and not-exists fields) so any client or server using that library can pluck them out!

Side-note: embarrassed to not realize how powerful SQLite virtual tables are. Weekend reading just got more interesting.
byhttps://jacky.wtf • posted archived copycurrent

Now that I understand them more, it seems like using this, some indexing and perhaps generated columns, I can use SQL to find if a value exists across a set of JSON documents. This would allow me to do something like check if I already liked a post on my site or if it's replying to a particular URL (or prefix of one). This is going to be janky, lol.

I'm about 45% of the way towards making the stuff that runs my site more “friendly” for non-developers. This means like having a service people can sign up and use or client applications that they can interop with. One big ass blocker, though, is having https://github.com/jalcine/sqlite-hypercore/ be viable of a project. I don't care too much about reinventing databases, I actually think that relational ones work pretty well and with a more peer-to-peer architecture, one could truly hold onto their data and handle consent on what gets to work with it.

Or rather, it's not able to show my main feed because of some SQL issue. "dbtax", anyone?
byhttps://jacky.wtf • posted archived copycurrent

Yeah so it looks like I ran into rusqlite/rusqlite!433 on GitHub. Very annoying and I need to find a way to either add a test for this or add an update trigger to fix up these values. I understand the former more, so I'll do that.

This came in handy. https://stackoverflow.com/questions/426495/how-do-you-rename-a-table-in-sqlite-3-0#426512

I'm always curious about ways of using SQLite in a distributed fashion but not like how we currently do distributed databases. I'm thinking more like "file sharing" distributed. I've also abstracted a lot of the I/O for my site — my long term hope is to be able to "pack" the data for it in a 'portable' (torrenting) way, so I can modify it from any device that's familiar with the filesystem layout and the database schema. This would effectively put the data in the hands of clients that are permitted access to it. This is how something like inkandswitch's Local First Software could be used for personal sites.

I'm glad that SQLite nudges away from concrete types and uses storage types. Because the need to test values is pushed up into the application layer, and I don't think I like doing a lot of strict typing in the database (though I've been bitten with and without it)

.