IxSet: a set with multiple indexed keys

To use IxSet you will need to install the optional ixset package.

In the first acid-state example we stored a single value. But in real database we typically need to store a large collection of records. And we want to be able to efficiently search and update those records. For simple key/value pairs we can use `Data.Map`. However, in practice, we often want to have *multiple* keys. That is what `IxSet` set offers -- a set-like type which can be indexed by multiple keys. IxSet can be found here on hackage. In this example, we will use `IxSet` to create a mini-blog.
> {-# LANGUAGE DeriveDataTypeable, GeneralizedNewtypeDeriving, RecordWildCards, TemplateHaskell #-} > module Main where > import Control.Monad.Reader (ask) > import Control.Monad.State (get, put) > import Data.Acid (Update, Query, makeAcidic) > import Data.Acid.Advanced (update', query') > import Data.Data (Data, Typeable) > import Data.IxSet (Indexable(..), IxSet(..), (@=), Proxy(..), getOne, ixFun, ixSet) > import qualified Data.IxSet as IxSet > import Data.SafeCopy (SafeCopy, base, deriveSafeCopy) > import Data.Text (Text) > import qualified Data.Text as Text > import Data.Time (UTCTime(..))
The first thing we are going to need is a type to represent a blog post. It is convenient to assign a unique id to each blog post so that it can be easily referenced in urls and easily queried in the `IxSet`. In order to keep ourselves sane, we can create a `newtype` wrapper around an `Integer` instead of just using a nameless `Integer`.
> newtype PostId = PostId { unPostId :: Integer } > deriving (Eq, Ord, Data, Enum, Typeable, SafeCopy)
Note that in addition to deriving normal classes like `Eq` and `Ord`, we also derive an instance of `SafeCopy`. This is not required by `IxSet` itself, but since we want to store the our blog posts in `acid-state` we will need it there. Our blog post will be able to have two status 'draft' and 'published'. We could use a boolean value, but it is easier to understand what `Draft` vs `Published` mean instead of trying to remember what `True` and `False` mean. Additionally, we can easily extend the type with additional states later.
> data Status = > Draft > | Published > deriving (Eq, Ord, Data, Typeable) > > deriveSafeCopy 0 'base ''Status
A now we can create a simple record which represents a single blog post:
> data Post = Post > { postId :: PostId > , title :: Text > , author :: Text > , body :: Text > , date :: UTCTime > , status :: Status > , tags :: [Text] > } > deriving (Eq, Ord, Data, Typeable) > > deriveSafeCopy 0 'base ''Post
Each `IxSet` key needs to have a unique type. Looking at `Post` it seems like that could be trouble -- because we have multiple fields which all have the type `Text`. Fortunately, we can easily get around this by introducing some newtypes which are used for indexing.
> newtype Title = Title Text deriving (Eq, Ord, Data, Typeable, SafeCopy) > newtype Author = Author Text deriving (Eq, Ord, Data, Typeable, SafeCopy) > newtype Tag = Tag Text deriving (Eq, Ord, Data, Typeable, SafeCopy)
We are now ready to create an instance of the `Indexable` class. This is the class that defines the keys for a `Post` so that we can store it in an `IxSet`:
> instance Indexable Post where > empty = ixSet [ ixFun $ \bp -> [ postId bp ] > , ixFun $ \bp -> [ Title $ title bp ] > , ixFun $ \bp -> [ Author $ author bp ] > , ixFun $ \bp -> map Tag (tags bp) > , ixFun $ (:[]) . date -- point-free, just for variety > ] >
In the `Indexable Post` instance we create a list of `Ix Post` values by using the `ixFun` helper function:
#ifdef HsColour > ixFun :: (Ord b, Typeable b) => (a -> [b]) -> Ix a #endif
We pass to `ixFun` a key extraction function. For example, in this line:
#ifdef HsColour > ixFun $ \bp -> [ postId bp ] #endif
we extract the `PostId` from a `Post`. Note that we return a list of keys values not just a single key. That is because a single entry might have several keys for a specific type. For example, a `Post` has a list of tags. But, we want to be able to search for posts that match a specific tag. So, we index each tag separately:
#ifdef HsColour > ixFun $ \bp -> map Tag (tags bp) #endif
For the `Title` and `Author` keys we add the newtype wrapper. Now that we have a place to store our posts, we can stash them in a record that we will store in acid-state:
> data Blog = Blog > { nextPostId :: PostId > , posts :: IxSet Post > } > > initialBlogState :: Blog > initialBlogState = > Blog { nextPostId = PostId 1 > , posts = empty > }
`IxSet` does not (currently) provide any auto-increment functionality for indexes, so we have to keep track of that ourselves. That is why we have the `nextPostId` field. Note that in `initialBlogState` the `nextPostId` is initialized to 1 not 0. Sometimes we want to create a `Post` that is not yet in the database, and hence does not have a valid `PostId`. I like to reserve `PostId 0` to mean uninitialized. If I ever see a `PostId 0` stored in the database, I know there is a bug in my code. Next we will create some update and query functions for our acid-state database. > -- | create a new, empty post and add it to the database > newPost :: UTCTime -> Update Blog Post > newPost pubDate = > do b@Blog{..} <- get > let post = Post { postId = nextPostId > , title = Text.empty > , author = Text.empty > , body = Text.empty > , date = pubDate > , status = Draft > , tags = [] > } > put $ b { nextPostId = succ nextPostId > , posts = IxSet.insert post posts > } > return post Nothing in that function should be too surprising. We have to pass in `UTCTime`, because we can not do IO in the update function. Because `PostId` is an instance of `Enum` we can use `succ` to increment it. To add the new post to the `IxSet` we use `IxSet.insert`.
#ifdef HsColour > insert :: (Typeable a, Ord a, Indexable a) => a -> IxSet a -> IxSet a #endif
Next we have a function that replaces a `Post` in the database with a newer version > -- | update the post in the database (indexed by PostId) > updatePost :: Post -> Update Blog () > updatePost updatedPost = > do b@Blog{..} <- get > put $ b { posts = IxSet.updateIx (postId updatedPost) updatedPost posts > } Note that instead of `insert` we use `updateIx`:
#ifdef HsColour > updateIx :: (Indexable a, Ord a, Typeable a, Typeable key) => key -> a -> IxSet a -> IxSet a #endif
The first argument to `updateIx` is a key that maps to the post we want to updated in the database. The key must uniquely identify a single entry in the database. In this case we use our primary key, `PostId`. Next we have some query functions. > postById :: PostId -> Query Blog (Maybe Post) > postById pid = > do Blog{..} <- ask > return $ getOne $ posts @= pid `postById` is used to lookup a specific post by its `PostId`. This is our first example of querying an `IxSet`. He we use the equals query operator:
#ifdef HsColour > (@=) :: (Typeable key, Ord a, Typeable a, Indexable a) => IxSet a -> key -> IxSet a #endif
It takes an `IxSet` and filters it to produce a new `IxSet` which only contains values that match the specified key. In this case, we have specified a primary key, so we expect exactly zero or one values in the resulting `IxSet`. So, we can use `getOne` to turn the result into a simple `Maybe` value:
#ifdef HsColour > getOne :: Ord a => IxSet a -> Maybe a #endif
Here is a query function that gets all the published posts and sorts them in reverse chronological order (aka, newest first) > getPublishedPosts :: Query Blog [Post] > getPublishedPosts = > do Blog{..} <- ask > return $ IxSet.toDescList (Proxy :: Proxy UTCTime) $ posts @= Published We use the `@=` operator again to select just the posts which have the `Published` status. Since the publication date is a key (`UTCTime`) we can use `toDescList` to return a sorted list:
#ifdef HsColour > toDescList :: (Typeable k, Typeable a, Indexable a) => Proxy k -> IxSet a -> [a] #endif
`toDescList` takes a funny argument `(Proxy :: Proxy UTCTime)`. While the `Post` type itself has an `Ord` instance -- we generally want to order by a specific key, which may have a different ordering. Since our keys are type directed, we need a way to pass a type to 'toDescList' so that it knows which key we want to order by. The `Proxy` type exists for that sole reason:
#ifdef HsColour > data Proxy a = Proxy #endif
It just gives us a place to stick a type signature that `toDescList` and other functions can use.