About me

My name is Josh Rotenberg. This is my personal site, built with mdbook.

The picture above was just a quick headshot I took with a laptop for work. A coworker added a body of water behind me, and I tilted it and added a filter. It just kind of stuck, so I use it now whenever I need a picture of me somewhere, which isn't all that often.

Projects

This is a brief list and explanation of personal projects I'm either currently working on or have worked on in the past.

Personal projects

~Active

adrs

adrs is a Rust rewrite of the original adr-tools by Nat Pryce. I was introduced to the Architectural Decision Records methodology a few years ago. While ADR tooling isn't strictly needed (beyond git and a text editor), it does help with consistency and bookkeeping, and this has been a fun project to work on.

Not so active

lingua_ex

lingua is a (human) language detection crate by Peter M. Stahl. I wrapped it with rustler to make the language detection API available in Elixir.

Writing

Recent

This is where I promise (no, for real this time) to start writing stuff.

Writing

These are some older posts from my previous hakyll based site that I haven't updated in a decade. I took the time to convert them instead of just throwing them in the trash.

Data.List Functions in Clojure

2013-12-31

Haskell's Data.List has some interesting functions. I've been playing around with a few recently and decided to see if there were Clojure equivalents, and if not, what it would take to implement them.

The first function is tails which returns the final segments of the argument, longest first:

λ> tails [1,2,3,4,5]
[[1,2,3,4,5],[2,3,4,5],[3,4,5],[4,5],[5],[]]

Before looking at the actual implementation, I assumed (incorrectly1) that it would use tail and recurse over and accumulate the list until it was empty. Haskell's tail is similar to rest or next in Clojure:

λ> tail [1,2,3,4,5]
[2,3,4,5]

user> (next [1,2,3,4,5])
(2 3 4 5)
user> (rest [1,2,3,4,5])
(2 3 4 5)

I messed around with loop/recur for a bit but then I remembered iterate:

user> (doc iterate)
-------------------------
clojure.core/iterate
([f x])
  Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects
user> (def s '(1 2 3 4 5))
#'user/s
user> (take 2 (iterate rest s))
((1 2 3 4 5) (2 3 4 5))
user> (take 3 (iterate rest s))
((1 2 3 4 5) (2 3 4 5) (3 4 5))

Great. iterate will just keep going because it returns a lazy sequence, so using take we can limit the result to just what we need:

user> (take (count s) (iterate rest s))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5))
user> ;; ooops, Haskell's tails goes one more and gives us the empty list, []
user> (take (inc (count s)) (iterate rest s)))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) ())
user> ;; note that we want rest here instead of next
user> (take (inc (count s)) (iterate next s))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) nil) 
user> ;; ok, lets wrap it up into a function and test it out on something else ...
user> (defn tails [xs] (take (inc (count xs)) (iterate rest xs)))
#'user/tails
user> (tails ["what" "is" "for" "lunch"])
(["what" "is" "for" "lunch"] ("is" "for" "lunch") ("for" "lunch") ("lunch") ())
user>

Hrm, it still works but now we have a mixture of types in the result. This may or may not bother you depending on what you plan to do with the result. We could normalize it using map and sequence:

user> (defn tails [xs] (map sequence (take (inc (count xs)) (iterate rest xs))))
#'user/tails
user>  (= (tails '(1 2 3 4 5)) (tails [1 2 3 4 5]))
true
user>

Or, with a little more work we can make the resulting collections the same type as the one passed in:

(defn tails [xs]
  (let [f (cond
           (vector? xs) vec
           (set? xs) set
           :else sequence)]
    (map f (take (inc (count xs)) (iterate rest xs)))))

user> (tails [1 2 3 4 5])
([1 2 3 4 5] [2 3 4 5] [3 4 5] [4 5] [5] [])
user> (tails '(1 2 3 4 5))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) ())
user> (tails #{1 2 3 4 5})
(#{1 2 3 4 5} #{2 3 4 5} #{3 4 5} #{4 5} #{5} #{})
user>

Next up is a similar function, inits, which is sort of the inverse of tails:

λ> inits [1, 2, 3, 4, 5]
[[],[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4,5]]

inits starts with an empty list, and then slowly builds upon the argument until the final item is the full list given. Here was my first attempt:

user> (def xs [1 2 3 4 5])
#'user/xs
user>  (reverse (take (inc (count xs)) (iterate drop-last xs)))
(() (1) (1 2) (1 2 3) (1 2 3 4) [1 2 3 4 5])
user>

Aside from the reverse, we basically switch out rest for drop-last. This works, but the reverse kind of feels like cheating to me. Here is a less hacky option: a list comprehension. We'll package it up into a function as well to hide the details:

(defn inits [xs]
  (for [n (range (count xs))]
    (take n xs)))

user> (inits xs)
(() (1) (1 2) (1 2 3) (1 2 3 4))
user>

And as above, retaining the collection type still applies.

There are a bunch of other good ones in Data.List. I might try to talk about more later.

Functions in Data.Maybe

2013-12-12

Most introductions to monads in Haskell talk about Maybe. Here are some of the handy functions in that module for operating on the type. maybe (the first function below) is should be available in ghci right away in Prelude, but the others require that you load the Data.Maybe module in ghci:

 λ> :m +Data.Maybe

maybe either applies a function to a Maybe or returns a default value. It's type signature looks like this:

 maybe :: b -> (a -> b) -> Maybe a -> b

 λ> let x = Just 40
 λ> let y = Nothing
 λ> maybe 30 (*2) x -- x is Just 40, so it gets applied to (*2)
 80
 λ> maybe 30 (*2) y -- y is Nothing, we end up with the default value
 30

isJust is straightforward. Given a Maybe argument, it returns true if it's Just:

 isJust :: Maybe a -> Bool

 λ> isJust x
 True
 λ> isJust y
 False

isNothing: not hard to figure what this does:

 isNothing :: Maybe a -> Bool

λ> isNothing x
False
λ> isNothing y
True

fromJust is a relatively unsafe way to get at the value of a Maybe:

fromJust :: Maybe a -> a

λ> fromJust x
40
λ> fromJust y
*** Exception: Maybe.fromJust: Nothing -- woops. use a case expression in real code

fromMaybe is similar to maybe, but instead of a function argument being applied to the value, you simply get the value in your Maybe or the default you pass in:

fromMaybe :: a -> Maybe a -> a

λ> fromMaybe 30 x
40
λ> fromMaybe 30 y
30

maybeToList will plop your Maybe's value into a list and return it:

maybeToList :: Maybe a -> [a]

λ> maybeToList x
[40]
λ> maybeToList y
[] -- always get a value, it just happens to be an empty list if we pass in Nothing

listToMaybe goes the other way:

listToMaybe :: [a] -> Maybe a

λ> listToMaybe [2]
Just 2
λ> listToMaybe $ maybeToList x
Just 40
λ> listToMaybe $ maybeToList y
Nothing
λ> listToMaybe [20,40,60] -- listToMaybe will drop all but the first element
Just 20
λ> listToMaybe "foo"
Just 'f'

catMaybes takes a list of Maybes and returns a list of the values for any Just value, but drops out your Nothings:

catMaybes :: [Maybe a] -> [a]

λ> catMaybes [x, y, (Just 21), Nothing, Just(22)]
[40,21,22]

mapMaybe lets you apply a function that accepts and returns a Maybe to a list of Maybes, but returns a list of the values (and skips the Nothings):

mapMaybe :: (a -> Maybe b) -> [a] -> [b]

λ> let f x = (+2) <$> x -- or fmap (+2) x
λ> f (Just 30)
Just 32
λ> mapMaybe f [(Just 20), Nothing, x]
[22,42]

Erlang Function Options

2013-11-25

In working on a small Erlang project I found myself writing a few functions that take one or more required parameters and then possibly a few optional ones. A common idiom in Erlang is take a list of tuples and/or atoms for optional parameters. For example, gen_tcp:connect has a signature like this: gen_tcp:connect(Host, Port, Options), where Options can be all kinds of stuff, such as:

{ok, Socket} = gen_tcp:connect("localhost", 12345, [binary, {packet, 0}, {active, false}]).

In this case, the host and port are required, and the third is a list of various kinds of meta data, in various forms. binary is a single atom; packet takes an integer, and according to the docs (which are actually part of inet:setopts) the value can be either 0 or raw, or 1, 2 or 4. The value of active can be true, false or once.

I'm using a function to build up a record that will then be passed on to another function for further processing. The user should only have to pass in one required parameter, and then a list of zero or more optional ones, so I want the call would look something like this:

{ok, Result} = my_func(Arg2, Arg2, Options).

where Options looks like [foo, {bar, 20}, {baz, "hello"}].

In this case no options are required, so I also added in a 1 parameter version of the function:

Pattern matching and recursion make handling this kind of thing pretty clean. If I need to add an option or change how one is handled, its clear where to do it and its easy to test those changes.

Follow up: http-streams and aeson

2013-06-05

In my post there were (at least) a few things that could be done better. I got some great feedback from the smart people on Hacker News so I've made a couple changes to the final source.

The biggest issue is that I was completely losing the benefit of streaming by reading the entire response body in and then parsing it. I knew this but as I mentioned I hadn't yet figured out how to handle it correctly. Fortunately someone else does, and with their suggestions (and a bit more Googling) I figured it out. It could probably still be cleaned up a little but for these purposes it does the trick.

On line 90 there is a new function called parseJSONFromStream. Calling this on the stream from receiveResponse (and adding the type hint for a Result wrapped around a Feed) gives us a similar situation as we had with Maybe Feed, so the main function does essentially the same thing still. With a little more time we could probably make this cleaner in fetchQuakes by giving the handler more to rather than calling it in a lambda.

And speaking of fetchQuakes, using http-streams' withConnection cleans it up a little bit by automatically handling closing of the connection. It saves a couple lines of code and makes the function less cluttered.

Thanks again for all of the feedback!

Haskell: http-streams and aeson

2013-05-28

I've been learning Haskell off and on for the past few months. It's pretty awesome, but definitely a challenging language to learn. I've had exposure and experience with functional programming for a couple years now (first with Scala, then/currently with Clojure, and some Erlang), and while those concepts are fairly well cemented in my brain, Haskell has a bunch of other stuff that make it a challenge. That's not to say this is a bad thing; in my limited experience, I've been finding that some of what initially seems complex turns more towards elegance once I'm comfortable with it, and Haskell rewards your efforts in many areas by being a really nice way to get things done. I figured I'd write up something here based on a recent challenge I faced and how with a bit of Googling and playing around I managed to figure it out and move on.

I'll usually try to bite off a small project in a new language once I feel reasonably comfortable with the basics. In the past I've found that there is one type of small project that makes a decent first try: RESTful API clients. There are a few reasons why:

Usually only requires a couple of external dependencies, if any (HTTP and JSON/XML parsing)
Easy to incrementally build and test (i.e. only implement a single API call initially, then go back and fill in)
Lots of APIs and variation to choose from
Possibly useful to you and/or someone else out there

For a simple walkthrough, I've chosen the USGS Geojson feeds that contain recent earthquakes broken down by recency and size. It's more of a "feed" than an API, but it works well because it requires no signup or authentication, and just lets you start pulling JSON off the web to see what happens. The list of feeds can be found here: http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php

Dealing with HTTP

There are a few options for HTTP client libraries, but I've chosen http-streams. It may or may not be the best/right choice for this application ... but I wanted to see how it worked. Getting something running with the basics is super easy. http-streams has some convenient wrappers for basic get requests:

We are all pretty used to simplified HTTP stuff these days, and Haskell is no exception here. We hand get a full URL and some handler function that will be called with the Response and an InputStream. To make things even easier when we want to see what the body looks like, the built in debugHandler will dump out the response header and body to stdout for us to peruse. If we just wanted the body alone on standard out, we can replace the above call to debugHandler with (\p i -> Streams.connect i stdout).This just connects the body stream to standard out and ignores the response headers.

If you hate trying to read unformatted JSON, you can install aeson-pretty which has a command line tool to pretty-print JSON:

HTTPStreamSimple | aeson-pretty

Or with python:

HTTPStreamSimple | python -m json.tool

If we want to start thinking about building a library around this, though, we need to change a couple of things. First, we probably want a little more control over the request so we can abstract it away from just a full URL every time, and second, we need to get the body into some intermediate state so we can parse it and return something more useful. This next snippet does just that, with the same result as our code above.

Now we are openning a connection to the host on port 80, building a request that contains the request type, the path, and saying what we expect in the response with setAccept "application/json". If we were sending a POST request or needed to send some other arbitrary header information, now would be the right time for that as well. Next, we send the request with an emtpy body (since it's a GET). We can now receive the response, and things start to get familiar again as the receiveResponse call's second argument is the same type as we used above with get. The difference, however, is that we are using the built in concatHandler which will give us access to the body rather than just dump it out to the terminal. For now that is all we are doing with S.putStr x. Finally we close the connection.

Later on we might handle the response and keep that code clear of the actual protocol layer. In a real project we might have a few layers of abstraction, but for this post we'll keep it fairly simple and just take advantage of having an easy point of entry with the handler. Now we need to hook up some parsing action with Aeson. This is actually the point at which I had some trouble, and thus the inspiration for this post, but before I explain that, here is a quick look at using Aeson so we can see what we are dealing with API-wise.

Parsing JSON

Looking at the Aeson project's examples, you'll see that you really have four options for dealing with JSON data to/from Haskell: the standard approach, a Template Haskell option, and two generic options. I'm too green to suggest the "right" one, but so far I'm a fan of using the one that requires the least amount of code up front, and then slowly migrating to the one that provides the clearest implementation in the end, so we'll start off making it easy on ourselves and use one of the generic methods and then ditch it for the standard approach which will require more work on our part but will also allow us to have more control of our types. Let's take a look at something similar to those examples but modified for our earthquakes. For now I'll just put in a few fields of our main types to keep it simple. If Aeson sees stuff that isn't in our type definition it'll just skip it (though if we have it and it's missing we'll get a parse error, more on that later). We'll also leave out the ToJSON stuff since we don't have a need to convert back to JSON in this case:

This is pretty basic. We define four main types: MetaData, Properties, Feature and Feed. Working from the bottom up, Feed is our top level container. Notice that it consists of a chunk of metadata and then a list of (zero or more) Features with [Feature]. A Feature itself contains a properties slot and (for now) just the Feature ID. Similarly we've left off most of the Properties items and just added the detail String and the mag (magnitude) Double. Finall we have a few slots in our MetaData type.

If we keep heading up to the top of the file, we see that we've imported the GHC.Generics which lets us get away with not having to tell Aeson how to parse our JSON (coupled with the DeriveGeneric extension at the top of the file), and we're also importing Data.ByteString.Lazy.Char8 because Aeson actually works on ByteStrings but we get away with using String by also including the OverloadedStrings extension.

To complete our tour of this file, jump back down to main. We call decode on some inline JSON and tell it we want a Maybe Feed. This is handy: if parsing fails for any reason, we'll get back Nothing, otherwise we should have JustFeed. Using a case statement we can pull out our result and print it out. We derived Show in all of our types so we get a nice representation of the type on the terminal for inspection.

This is now the point at which I got a bit stuck. In theory we have all the parts we should need to fetch the JSON and parse it into our data structure(s), but there was a problem: Aeson expects a lazy ByteString to decode, but what we are getting from http-streams (or from the underlying io-streams, I guess) is strict. Here is what I did:

In jsonHandler, I'm using Streams.toList which should give me the whole body as a list of chunks. This ensures that we get all the parts from a large body so we can correctly parse the JSON. fromChunks let's us take a list of strict ByteStrings and converts it to a single lazy ByteString, which we can hand directly to Aeson's decode to parse our JSON. Rad.

I'm hoping this is the right way to go. After searching a bit, I found out that I may not be the only one with this issue, and in fact, there is an issue logged against Aeson with a workaround. This would still require that the body be converted to a single strict ByteString, however, so I'm not sure if it gains much in the way of convenience or performance.

For the final revision, I've done a few things. First, as mentioned earlier, there are a few options with Aeson. I started out with the simple Generics method that let us write the least amount of code to start parsing some JSON into our types. From what I can tell, though, this backs us into a corner with our type's field names; if they don't match the JSON keys, we don't get them. The other problem is that we don't have the ability to transform types, for instance, we might want to parse a date into a specific date type rather than just using a String. Third, if you happen to be grabbing JSON with optional fields, going to the standard approach will let you handle these correctly. Basically you have the typical tradeoffs: more code to write, debug and maintain in exchange for more flexibility and tighter types.

The next things to notice: I pulled the HTTP call out of main and wrapped it nicely in its own function, and added some parameters to build the URL. Now we have a more DSL feel, passing in our required magnitude and timeframe. The result is still a list of feeds, and after making sure we have some we do some contrived stats on the magnitudes.

Haskell is pretty nice. Feedback welcome from newbies and seasoned Haskellers alike.

(Edit: after posting this I found a similar example here by Fujimura Daisuke)

Contact

Email: joshrotenberg@gmail.com
LinkedInt: https://www.linkedin.com/in/joshrotenberg
GitHub: https://github.com/joshrotenberg
Personal site: https://joshrotenberg.com (you are here)

josh rotenberg