IxSet: a set with multiple indexed keys

To use IxSet you will need to install the optional ixset package.

In the first acid-state example we stored a single value. But in real database we typically need to store a large collection of records. And we want to be able to efficiently search and update those records. For simple key/value pairs we can use `Data.Map`. However, in practice, we often want to have *multiple* keys. That is what `IxSet` set offers -- a set-like type which can be indexed by multiple keys. IxSet can be found here on hackage. In this example, we will use `IxSet` to create a mini-blog.
> {-# LANGUAGE DeriveDataTypeable, GeneralizedNewtypeDeriving, RecordWildCards, TemplateHaskell, TypeFamilies, OverloadedStrings #-} > module Main where > import Control.Applicative ((<$>), optional) > import Control.Exception (bracket) > import Control.Monad (msum, mzero) > import Control.Monad.Reader (ask) > import Control.Monad.State (get, put) > import Control.Monad.Trans (liftIO) > import Data.Acid (AcidState, Update, Query, makeAcidic, openLocalState) > import Data.Acid.Advanced (update', query') > import Data.Acid.Local (createCheckpointAndClose) > import Data.Data (Data, Typeable) > import Data.IxSet (Indexable(..), IxSet(..), (@=), Proxy(..), getOne, ixFun, ixSet) > import qualified Data.IxSet as IxSet > import Data.SafeCopy (SafeCopy, base, deriveSafeCopy) > import Data.Text (Text) > import Data.Text.Lazy (toStrict) > import qualified Data.Text as Text > import Data.Time (UTCTime(..), getCurrentTime) > import Happstack.Server (ServerPart, Method(POST, HEAD, GET), Response, decodeBody, defaultBodyPolicy, dir, lookRead, lookText, method, notFound, nullConf, nullDir, ok, seeOther, simpleHTTP, toResponse) > import Text.Blaze ((!), Html) > import qualified Text.Blaze.Html4.Strict as H > import qualified Text.Blaze.Html4.Strict.Attributes as A
The first thing we are going to need is a type to represent a blog post. It is convenient to assign a unique id to each blog post so that it can be easily referenced in urls and easily queried in the `IxSet`. In order to keep ourselves sane, we can create a `newtype` wrapper around an `Integer` instead of just using a nameless `Integer`.
> newtype PostId = PostId { unPostId :: Integer } > deriving (Eq, Ord, Data, Enum, Typeable, SafeCopy)
Note that in addition to deriving normal classes like `Eq` and `Ord`, we also derive an instance of `SafeCopy`. This is not required by `IxSet` itself, but since we want to store the our blog posts in `acid-state` we will need it there. Our blog post will be able to have two status 'draft' and 'published'. We could use a boolean value, but it is easier to understand what `Draft` vs `Published` mean instead of trying to remember what `True` and `False` mean. Additionally, we can easily extend the type with additional states later.
> data Status = > Draft > | Published > deriving (Eq, Ord, Data, Typeable) > > $(deriveSafeCopy 0 'base ''Status)
A now we can create a simple record which represents a single blog post:
> data Post = Post > { postId :: PostId > , title :: Text > , author :: Text > , body :: Text > , date :: UTCTime > , status :: Status > , tags :: [Text] > } > deriving (Eq, Ord, Data, Typeable) > > $(deriveSafeCopy 0 'base ''Post)
Each `IxSet` key needs to have a unique type. Looking at `Post` it seems like that could be trouble -- because we have multiple fields which all have the type `Text`. Fortunately, we can easily get around this by introducing some newtypes which are used for indexing.
> newtype Title = Title Text deriving (Eq, Ord, Data, Typeable, SafeCopy) > newtype Author = Author Text deriving (Eq, Ord, Data, Typeable, SafeCopy) > newtype Tag = Tag Text deriving (Eq, Ord, Data, Typeable, SafeCopy) > newtype WordCount = WordCount Int deriving (Eq, Ord, Data, Typeable, SafeCopy)
We are now ready to create an instance of the `Indexable` class. This is the class that defines the keys for a `Post` so that we can store it in an `IxSet`:
> instance Indexable Post where > empty = ixSet [ ixFun $ \bp -> [ postId bp ] > , ixFun $ \bp -> [ Title $ title bp ] > , ixFun $ \bp -> [ Author $ author bp ] > , ixFun $ \bp -> [ status bp ] > , ixFun $ \bp -> map Tag (tags bp) > , ixFun $ (:[]) . date -- point-free, just for variety > , ixFun $ \bp -> [ WordCount (length $ Text.words $ body bp) ] > ] >
In the `Indexable Post` instance we create a list of `Ix Post` values by using the `ixFun` helper function:
#ifdef HsColour > ixFun :: (Ord b, Typeable b) => (a -> [b]) -> Ix a #endif
We pass to `ixFun` a key extraction function. For example, in this line:
#ifdef HsColour > ixFun $ \bp -> [ postId bp ] #endif
we extract the `PostId` from a `Post`. Note that we return a list of keys values not just a single key. That is because a single entry might have several keys for a specific type. For example, a `Post` has a list of tags. But, we want to be able to search for posts that match a specific tag. So, we index each tag separately:
#ifdef HsColour > ixFun $ \bp -> map Tag (tags bp) #endif
Note that the keys do not have to directly correspond to a field in the record. We can perform calculations to create arbitrary keys. For example, the `WordCount` key calculates the number of words in a post:
#ifdef HsColour > ixFun $ \bp -> [ WordCount (length $ Text.words $ body bp) ] #endif
For the `Title` and `Author` keys we add the newtype wrapper. Now that we have a place to store our posts, we can stash them in a record that we will store in acid-state:
> data Blog = Blog > { nextPostId :: PostId > , posts :: IxSet Post > } > deriving (Data, Typeable) > > $(deriveSafeCopy 0 'base ''Blog) > > initialBlogState :: Blog > initialBlogState = > Blog { nextPostId = PostId 1 > , posts = empty > }
`IxSet` does not (currently) provide any auto-increment functionality for indexes, so we have to keep track of that ourselves. That is why we have the `nextPostId` field. Note that in `initialBlogState` the `nextPostId` is initialized to 1 not 0. Sometimes we want to create a `Post` that is not yet in the database, and hence does not have a valid `PostId`. I like to reserve `PostId 0` to mean uninitialized. If I ever see a `PostId 0` stored in the database, I know there is a bug in my code. Next we will create some update and query functions for our acid-state database. > -- | create a new, empty post and add it to the database > newPost :: UTCTime -> Update Blog Post > newPost pubDate = > do b@Blog{..} <- get > let post = Post { postId = nextPostId > , title = Text.empty > , author = Text.empty > , body = Text.empty > , date = pubDate > , status = Draft > , tags = [] > } > put $ b { nextPostId = succ nextPostId > , posts = IxSet.insert post posts > } > return post Nothing in that function should be too surprising. We have to pass in `UTCTime`, because we can not do IO in the update function. Because `PostId` is an instance of `Enum` we can use `succ` to increment it. To add the new post to the `IxSet` we use `IxSet.insert`.
#ifdef HsColour > insert :: (Typeable a, Ord a, Indexable a) => a -> IxSet a -> IxSet a #endif
Next we have a function that replaces a `Post` in the database with a newer version > -- | update the post in the database (indexed by PostId) > updatePost :: Post -> Update Blog () > updatePost updatedPost = > do b@Blog{..} <- get > put $ b { posts = IxSet.updateIx (postId updatedPost) updatedPost posts > } Note that instead of `insert` we use `updateIx`:
#ifdef HsColour > updateIx :: (Indexable a, Ord a, Typeable a, Typeable key) => key -> a -> IxSet a -> IxSet a #endif
The first argument to `updateIx` is a key that maps to the post we want to updated in the database. The key must uniquely identify a single entry in the database. In this case we use our primary key, `PostId`. Next we have some query functions. > postById :: PostId -> Query Blog (Maybe Post) > postById pid = > do Blog{..} <- ask > return $ getOne $ posts @= pid `postById` is used to lookup a specific post by its `PostId`. This is our first example of querying an `IxSet`. He we use the equals query operator:
#ifdef HsColour > (@=) :: (Typeable key, Ord a, Typeable a, Indexable a) => IxSet a -> key -> IxSet a #endif
It takes an `IxSet` and filters it to produce a new `IxSet` which only contains values that match the specified key. In this case, we have specified a primary key, so we expect exactly zero or one values in the resulting `IxSet`. So, we can use `getOne` to turn the result into a simple `Maybe` value:
#ifdef HsColour > getOne :: Ord a => IxSet a -> Maybe a #endif
Here is a query function that gets all the post with a specific status (`Published` vs `Draft`) and sorts them in reverse chronological order (aka, newest first): > postsByStatus :: Status -> Query Blog [Post] > postsByStatus status = > do Blog{..} <- ask > return $ IxSet.toDescList (Proxy :: Proxy UTCTime) $ posts @= status We use the `@=` operator again to select just the posts which have the matching status. Since the publication date is a key (`UTCTime`) we can use `toDescList` to return a sorted list:
#ifdef HsColour > toDescList :: (Typeable k, Typeable a, Indexable a) => Proxy k -> IxSet a -> [a] #endif
`toDescList` takes a funny argument `(Proxy :: Proxy UTCTime)`. While the `Post` type itself has an `Ord` instance -- we generally want to order by a specific key, which may have a different ordering. Since our keys are specified by type, we need a way to pass a type to 'toDescList' so that it knows which key we want to order by. The `Proxy` type exists for that sole reason:
#ifdef HsColour > data Proxy a = Proxy #endif
It just gives us a place to stick a type signature that `toDescList` and other functions can use. Next we turn our acid-state functions into events: > $(makeAcidic ''Blog > [ 'newPost > , 'updatePost > , 'postById > , 'postsByStatus > ])
> template :: Text -> [Html] -> Html -> Response > template title headers body = > toResponse $ > H.html $ do > H.head $ do > css > H.title (H.toHtml title) > H.meta ! A.httpEquiv "Content-Type" ! A.content "text/html;charset=utf-8" > sequence_ headers > H.body $ do > body >
> css :: Html > css = > let s = Text.concat [ "body { color: #555; }" > , ".author { color: #aaa; }" > , ".date { color: #aaa; }" > , ".tags { color: #aaa; }" > , ".post { border-bottom: 1px dotted #aaa; }" > , ".bdy { color: #555; }" > , "label { display: inline-block; width: 3em; }" > ] > in H.style ! A.type_ "text/css" $ H.toHtml s >
> edit :: AcidState Blog -> ServerPart Response > edit acid = > do pid <- PostId <$> lookRead "id" > mMsg <- optional $ lookText "msg" > mPost <- query' acid (PostById pid) > case mPost of > Nothing -> > notFound $ template "no such post" [] $ do "Could not find a post with id " > H.toHtml (unPostId pid) > (Just p@(Post{..})) -> > msum [ do method GET > ok $ template "foo" [] $ do > case mMsg of > (Just msg) | msg == "saved" -> "Changes saved!" > _ -> "" > H.form ! A.enctype "multipart/form-data" ! A.method "POST" ! A.action (H.toValue $ "/edit?id=" ++ (show $ unPostId pid)) $ do > H.label "title" ! A.for "title" >> H.input ! A.type_ "text" ! A.name "title" ! A.id "title" ! A.size "80" ! A.value (H.toValue title) > H.br > H.label "author" ! A.for "author" >> H.input ! A.type_ "text" ! A.name "author" ! A.id "author" ! A.size "40" ! A.value (H.toValue author) > H.br > H.label "tags" ! A.for "tags" >> H.input ! A.type_ "text" ! A.name "tags" ! A.id "tags" ! A.size "40" ! A.value (H.toValue $ Text.intercalate ", " tags) > H.br > H.label "body" ! A.for "body" > H.br > H.textarea ! A.cols "80" ! A.rows "20" ! A.name "body" $ H.toHtml body > H.br > H.button ! A.name "status" ! A.value "publish" $ "publish" > H.button ! A.name "status" ! A.value "save" $ "save as draft" > , do method POST > ttl <- lookText' "title" > athr <- lookText' "author" > tgs <- lookText' "tags" > > bdy <- lookText' "body" > now <- liftIO $ getCurrentTime > stts <- do s <- lookText' "status" > case s of > "save" -> return Draft > "publish" -> return Published > _ -> mzero > let updatedPost = p { title = ttl > , author = athr > , body = bdy > , date = now > , status = stts > , tags = map Text.strip $ Text.splitOn "," tgs > } > update' acid (UpdatePost updatedPost) > case status of > Published -> seeOther ("/view?id=" ++ (show $ unPostId pid)) (toResponse ()) > Draft -> seeOther ("/edit?msg=saved&id=" ++ (show $ unPostId pid)) (toResponse ()) > ] > > where lookText' = fmap toStrict . lookText
> new :: AcidState Blog -> ServerPart Response > new acid = > do method POST > now <- liftIO $ getCurrentTime > post <- update' acid (NewPost now) > seeOther ("/edit?id=" ++ show (unPostId $ postId post)) (toResponse ())
> postHtml :: Post -> Html > postHtml (Post{..}) = > H.div ! A.class_ "post" $ do > H.h1 $ H.toHtml title > H.div ! A.class_ "author" $ do "author: " >> H.toHtml author > H.div ! A.class_ "date" $ do "published: " >> H.toHtml (show date) > H.div ! A.class_ "tags" $ do "tags: " >> H.toHtml (Text.intercalate ", " tags) > H.div ! A.class_ "bdy" $ H.toHtml body > H.div ! A.class_ "post-footer" $ H.a ! A.href (H.toValue $ "/view?id=" ++ show (unPostId postId)) $ "permalink"
> view :: AcidState Blog -> ServerPart Response > view acid = > do pid <- PostId <$> lookRead "id" > mPost <- query' acid (PostById pid) > case mPost of > Nothing -> > notFound $ template "no such post" [] $ do "Could not find a post with id " > H.toHtml (unPostId pid) > (Just p) -> > ok $ template (title p) [] $ do > H.a ! A.href (H.toValue $ "/edit?id=" ++ show (unPostId (postId p))) $ "edit this post" > (postHtml p)
> home :: AcidState Blog -> ServerPart Response > home acid = > do published <- query' acid (PostsByStatus Published) > ok $ template "home" [] $ do > H.form ! A.enctype "multipart/form-data" ! A.method "POST" ! A.action "/new" $ do > H.button $ "new post" > mapM_ postHtml published
> route :: AcidState Blog -> ServerPart Response > route acid = > do decodeBody (defaultBodyPolicy "/tmp/" 0 1000000 1000000) > msum [ dir "favicon.ico" $ notFound (toResponse ()) > , dir "edit" $ edit acid > , dir "new" $ new acid > , dir "view" $ view acid > , nullDir >> home acid > ]
> main :: IO () > main = > do bracket (openLocalState initialBlogState) > (createCheckpointAndClose) > (\acid -> > simpleHTTP nullConf (route acid))