Welcome to Databass, Part 2, even more bass. In part 1, we defined a query GADT and runQuery
function. Here, we'll figure out how to materialize relations and tuples in memory. As before, full source is on github. First, though, I had a question about implementing Group
and Ungroup
, which we skipped in part 1. For the sake of completeness, we'll add them here.
Group and Ungroup
Recall the relations from Part 1.
╔═════════════════════════════════════════════════════════════════╗
║ S (suppliers) SP (suppliers-parts) ║
║ ┌────┬───────┬────────┬────────┐ ┌────┬────┬─────┐ ║
║ │ S# │ SNAME │ STATUS │ CITY │ │ S# │ P# │ QTY │ ║
║ ├════┼───────┼────────┼────────┤ ├════┼════┼─────┤ ║
║ │ S1 │ Smith │ 20 │ London │ │ S1 │ P1 │ 300 │ ║
║ │ S2 │ Jones │ 10 │ Paris │ │ S1 │ P2 │ 200 │ ║
║ │ S3 │ Blake │ 30 │ Paris │ │ S1 │ P3 │ 400 │ ║
║ │ S4 │ Clark │ 20 │ London │ │ S1 │ P4 │ 200 │ ║
║ │ S5 │ Adams │ 30 │ Athens │ │ S1 │ P5 │ 100 │ ║
║ └────┴───────┴────────┴────────┘ │ S1 │ P6 │ 100 │ ║
║ P (parts) │ S2 │ P1 │ 300 │ ║
║ ┌────┬───────┬───────┬────────┬────────┐ │ S2 │ P2 │ 400 │ ║
║ │ P# │ PNAME │ COLOR │ WEIGHT │ CITY │ │ S3 │ P2 │ 200 │ ║
║ ├════┼───────┼───────┼────────┼────────┤ │ S4 │ P2 │ 200 │ ║
║ │ P1 │ Nut │ Red │ 12.0 │ London │ │ S4 │ P4 │ 300 │ ║
║ │ P2 │ Bolt │ Green │ 17.0 │ Paris │ │ S4 │ P5 │ 400 │ ║
║ │ P3 │ Screw │ Blue │ 17.0 │ Oslo │ └────┴────┴─────┘ ║
║ │ P4 │ Screw │ Red │ 14.0 │ London │ ║
║ │ P5 │ Cam │ Blue │ 12.0 │ Paris │ ║
║ │ P6 │ Cog │ Red │ 19.0 │ London │ ║
║ └────┴───────┴───────┴────────┴────────┘ ║
╚═════════════════════════════════════════════════════════════════╝
SP GROUP ( { P#, QTY } AS PQ )
looks like
╔════════════════════════════╗
║ ┌────┬────────────────┐ ║
║ │ S# │ PQ │ ║
║ ├════┼────────────────┤ ║
║ │ │ ┌────┬──────┐ │ ║
║ │ S1 │ │ P# │ QTY │ │ ║
║ │ │ ├────┼──────┤ │ ║
║ │ │ │ P1 │ 300 │ │ ║
║ │ │ │ P2 │ 200 │ │ ║
║ │ │ │ P3 │ 400 │ │ ║
║ │ │ │ P4 │ 200 │ │ ║
║ │ │ │ P5 │ 100 │ │ ║
║ │ │ │ P6 │ 100 │ │ ║
║ │ │ └────┴──────┘ │ ║
║ │ │ │ ║
║ │ │ ┌────┬──────┐ │ ║
║ │ S2 │ │ P# │ QTY │ │ ║
║ │ │ ├────┼──────┤ │ ║
║ │ │ │ P1 │ 300 │ │ ║
║ │ │ │ P2 │ 400 │ │ ║
║ │ │ └────┴──────┘ │ ║
║ │ │ ┌────┬──────┐ │ ║
║ │ S3 │ │ P# │ QTY │ │ ║
║ │ │ ├────┼──────┤ │ ║
║ │ │ │ P2 │ 200 │ │ ║
║ │ │ └────┴──────┘ │ ║
║ │ │ │ ║
║ │ │ ┌────┬──────┐ │ ║
║ │ S4 │ │ P# │ QTY │ │ ║
║ │ │ ├────┼──────┤ │ ║
║ │ │ │ P2 │ 200 │ │ ║
║ │ │ │ P4 │ 300 │ │ ║
║ │ │ │ P5 │ 400 │ │ ║
║ │ │ └────┴──────┘ │ ║
║ └────┴────────────────┘ ║
╚════════════════════════════╝
The group operation "smooshes" (the technical term) the given attributes into a relation and then assigns that relation to a new attribute. The body of the returned relation (the outer one) has one entry for every distinct value of the attributes not named in the group operation.
SPQ UNGROUP PQ
, where SPQ
is the above relation, performs the inverse operation, returning the original value of the SP
relation.
We've been using [Tuple t]
as our return value for queries to mean "relation," so that's what we'll use as the attribute value.
Group ::
( Split grouped rest t
, Sortable (l ::: [Tuple grouped] ': rest)
, Ord (Tuple rest)
) =>
Var l ->
Proxy grouped ->
Proxy rest ->
Query t relations ->
Query (Sort (l ::: [Tuple grouped] ': rest)) relations
Ungroup ::
(Split '[l ::: [Tuple grouped]] rest t, Sortable (grouped :++ rest)) =>
Var l ->
Proxy grouped ->
Proxy rest ->
Query t relations ->
Query (Sort (grouped :++ rest)) relations
Note that Ord
is a little stronger than Eq
which would be semantically sufficient. However, having Ord
lets us implement Group
in \(O(n\log(n))\), as opposed to \(O(n^2)\) with just Eq
.
Group var (_ :: Proxy grouped) (_ :: Proxy rest) q ->
let splits = sortBy (comparing snd) $ map (split @grouped @rest) $ runQuery q db
groups = groupBy ((==) `on` snd) splits
in map
( \((grouped, rest) :| gs) ->
quicksort (Ext var (grouped : fmap fst gs) rest)
)
groups
Ungroup (_ :: Var l) (_ :: Proxy grouped) (_ :: Proxy rest) q ->
concatMap
( \tuple ->
let (Ext _ grouped Empty, rest) =
split @'[l ::: [Tuple grouped]] @rest tuple
in map (\group -> quicksort (append group rest)) grouped
)
(runQuery q db)
Populating the database
We're going to assume that on every run of our hypothetical app, we'll want to use exactly one schema that doesn't change throughout the lifetime of the program.
To create an empty database in memory to store relations
, we want something like
emptyDB :: Proxy relations -> Tuple (RelationsToDB relations)
emptyDB _ = case ???
However, in this form, we can't scrutinize the relations
type variable to determine if it's empty or a cons like we would a normal list. Instead, we need a typeclass.
class EmptyDB (relations :: [Mapping Symbol Type]) where
emptyDB :: Proxy relations -> Tuple (RelationsToDB relations)
The empty case is easy.
instance EmptyDB '[] where
emptyDB _ = Empty
For cons, we insert the induction into the context of the instance, before the =>
.
instance
(EmptyDB relations, Ord (Tuple key)) =>
EmptyDB (name ::: (Relation heading key val) ': relations)
where
emptyDB (_ :: Proxy (name ::: relation ': relations)) =
Ext (Var @name) mempty (emptyDB (Proxy @relations))
To insert tuples into the database, we split the tuple into (key, val)
and add them to the correct Map
. We should also fail in the case that the key is already in the database. Again, when implementing something with this many constraints, it's useful to start with () =>
and fill it in until the compiler is happy.
insert ::
forall relation relations name heading key val .
( relations :! name ~ relation
, Relation heading key val ~ relation
, Split key val heading
, Ord (Tuple key)
, IsMember name (RelationToMap relation) (RelationsToDB relations)
, Updatable name (RelationToMap relation) (RelationsToDB relations) (RelationsToDB relations)
) =>
Var name ->
Proxy relations ->
Tuple heading ->
Tuple (RelationsToDB relations) ->
Maybe (Tuple (RelationsToDB relations))
insert var _ tuple db =
let old_rel = lookp var db
(key, val) = split @key @val tuple
new_rel = M.insert key val old_rel
in if M.member key old_rel then Nothing else Just (update db var new_rel)
We'll also want to be able to update database entries. Let's first write a version that updates the relation at a given key and is only allowed to modify the portion of the heading outside of the key attributes. For keys not in the relation, we'll just do nothing.
updateByKey ::
( relations :! name ~ relation
, Relation heading key val ~ relation
, Ord (Tuple key)
, IsMember name (RelationToMap relation) (RelationsToDB relations)
, Updatable name (RelationToMap relation) (RelationsToDB relations) (RelationsToDB relations)
) =>
Var name ->
Proxy relations ->
Tuple key ->
(Tuple val -> Tuple val) ->
Tuple (RelationsToDB relations) ->
Tuple (RelationsToDB relations)
updateByKey var _ key fn db =
let old_rel = lookp var db
new_rel = M.adjust fn key old_rel
in update db var new_rel
We could also write an update function that takes an arbitrary WHERE
clause that filters on all attributes in the tuple, but that would be slower, and in practice, most UPDATE
calls I've seen in SQL are of the form UPDATE table WHERE id=some_id
.
Writing to disk
To save state between program runs, we can write the entire database to a file periodically and read it at program start. Map
already has a Binary
instance, so all we need to do is write one for Tuple
.
The empty case is trivial.
instance Binary (Tuple '[]) where
put Empty = pure ()
get = pure Empty
For the non-empty case, the easiest thing to do is length-prefix every attribute of the tuple when serializing.
instance (Binary x, Binary (Tuple ts)) => Binary (Tuple (l ::: x ': ts)) where
put (Ext _ x xs) = do
let bytes = runPut $ put x
put (BL.length bytes)
put x
put xs
get = do
size :: Int64 <- get
x <- isolate (fromIntegral size) get
Ext Var x <$> get
I'm not completely satisfied with this. It requires that we serialize twice, once to find the length and once to actually write the thing. There's also a note in the source of the Binary
instance for regular lists about not blowing the stack for large lists with a naive implementation, which would certainly happen to us. To get around this, they write get
tail recursively and reverse at the end. I haven't tried doing that for a heterogeneous list, but it seems like it would be a nightmare to implement. I'd be happy to be wrong about that, though, if anyone is keen to try.
That concludes part 2. In the next installment, we'll see how to use what we've built to serve an API.