About me
My name is Josh Rotenberg. This is my personal site, built with mdbook.
The picture above was just a quick headshot I took with a laptop for work. A coworker added a body of water behind me, and I tilted it and added a filter. It just kind of stuck, so I use it now whenever I need a picture of me somewhere, which isn't all that often.
Projects
This is a brief list and explanation of personal projects I'm either currently working on or have worked on in the past.
Personal projects
~Active
-
adrs
is a Rust rewrite of the original adr-tools by Nat Pryce. I was introduced to the Architectural Decision Records methodology a few years ago. While ADR tooling isn't strictly needed (beyond git and a text editor), it does help with consistency and bookkeeping, and this has been a fun project to work on.
Not so active
-
lingua is a (human) language detection crate by Peter M. Stahl. I wrapped it with rustler to make the language detection API available in Elixir.
Writing
Recent
This is where I promise (no, for real this time) to start writing stuff.
Writing
These are some older posts from my previous hakyll based site that I haven't updated in a decade. I took the time to convert them instead of just throwing them in the trash.
Data.List Functions in Clojure
2013-12-31
Haskell's Data.List has some interesting functions. I've been playing around with a few recently and decided to see if there were Clojure equivalents, and if not, what it would take to implement them.
The first function is tails
which returns the final segments of the argument, longest first:
λ> tails [1,2,3,4,5]
[[1,2,3,4,5],[2,3,4,5],[3,4,5],[4,5],[5],[]]
Before looking at the actual implementation, I assumed (incorrectly1)
that it would use tail
and recurse over and accumulate the list
until it was empty. Haskell's tail
is similar to rest
or next
in
Clojure:
λ> tail [1,2,3,4,5]
[2,3,4,5]
user> (next [1,2,3,4,5])
(2 3 4 5)
user> (rest [1,2,3,4,5])
(2 3 4 5)
I messed around with loop
/recur
for a bit but then I remembered iterate
:
user> (doc iterate)
-------------------------
clojure.core/iterate
([f x])
Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects
user> (def s '(1 2 3 4 5))
#'user/s
user> (take 2 (iterate rest s))
((1 2 3 4 5) (2 3 4 5))
user> (take 3 (iterate rest s))
((1 2 3 4 5) (2 3 4 5) (3 4 5))
Great. iterate
will just keep going because it returns a lazy
sequence, so using take
we can limit the result to just what we
need:
user> (take (count s) (iterate rest s))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5))
user> ;; ooops, Haskell's tails goes one more and gives us the empty list, []
user> (take (inc (count s)) (iterate rest s)))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) ())
user> ;; note that we want rest here instead of next
user> (take (inc (count s)) (iterate next s))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) nil)
user> ;; ok, lets wrap it up into a function and test it out on something else ...
user> (defn tails [xs] (take (inc (count xs)) (iterate rest xs)))
#'user/tails
user> (tails ["what" "is" "for" "lunch"])
(["what" "is" "for" "lunch"] ("is" "for" "lunch") ("for" "lunch") ("lunch") ())
user>
Hrm, it still works but now we have a mixture of types in the
result. This may or may not bother you depending on what you plan to
do with the result. We could normalize it using map
and sequence
:
user> (defn tails [xs] (map sequence (take (inc (count xs)) (iterate rest xs))))
#'user/tails
user> (= (tails '(1 2 3 4 5)) (tails [1 2 3 4 5]))
true
user>
Or, with a little more work we can make the resulting collections the same type as the one passed in:
(defn tails [xs]
(let [f (cond
(vector? xs) vec
(set? xs) set
:else sequence)]
(map f (take (inc (count xs)) (iterate rest xs)))))
user> (tails [1 2 3 4 5])
([1 2 3 4 5] [2 3 4 5] [3 4 5] [4 5] [5] [])
user> (tails '(1 2 3 4 5))
((1 2 3 4 5) (2 3 4 5) (3 4 5) (4 5) (5) ())
user> (tails #{1 2 3 4 5})
(#{1 2 3 4 5} #{2 3 4 5} #{3 4 5} #{4 5} #{5} #{})
user>
Next up is a similar function, inits
, which is sort of the inverse of tails
:
λ> inits [1, 2, 3, 4, 5]
[[],[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4,5]]
inits
starts with an empty list, and then slowly builds upon the
argument until the final item is the full list given. Here was my first attempt:
user> (def xs [1 2 3 4 5])
#'user/xs
user> (reverse (take (inc (count xs)) (iterate drop-last xs)))
(() (1) (1 2) (1 2 3) (1 2 3 4) [1 2 3 4 5])
user>
Aside from the reverse, we basically switch out rest for drop-last. This works, but the reverse kind of feels like cheating to me. Here is a less hacky option: a list comprehension. We'll package it up into a function as well to hide the details:
(defn inits [xs]
(for [n (range (count xs))]
(take n xs)))
user> (inits xs)
(() (1) (1 2) (1 2 3) (1 2 3 4))
user>
And as above, retaining the collection type still applies.
There are a bunch of other good ones in Data.List
. I might try to talk about more later.
Functions in Data.Maybe
2013-12-12
Most introductions to monads in Haskell talk about Maybe
. Here are
some of the handy functions in that module for operating on the type.
maybe
(the first function below) is should be available in ghci
right away in Prelude, but the others require that you load the
Data.Maybe module in ghci:
λ> :m +Data.Maybe
maybe
either applies a function to a Maybe
or returns a default
value. It's type signature looks like this:
maybe :: b -> (a -> b) -> Maybe a -> b
λ> let x = Just 40
λ> let y = Nothing
λ> maybe 30 (*2) x -- x is Just 40, so it gets applied to (*2)
80
λ> maybe 30 (*2) y -- y is Nothing, we end up with the default value
30
isJust
is straightforward. Given a Maybe
argument, it returns true if it's Just
:
isJust :: Maybe a -> Bool
λ> isJust x
True
λ> isJust y
False
isNothing
: not hard to figure what this does:
isNothing :: Maybe a -> Bool
λ> isNothing x
False
λ> isNothing y
True
fromJust
is a relatively unsafe way to get at the value of a Maybe
:
fromJust :: Maybe a -> a
λ> fromJust x
40
λ> fromJust y
*** Exception: Maybe.fromJust: Nothing -- woops. use a case expression in real code
fromMaybe
is similar to maybe
, but instead of a function argument
being applied to the value, you simply get the value in your Maybe
or the default you pass in:
fromMaybe :: a -> Maybe a -> a
λ> fromMaybe 30 x
40
λ> fromMaybe 30 y
30
maybeToList
will plop your Maybe
's value into a list and return it:
maybeToList :: Maybe a -> [a]
λ> maybeToList x
[40]
λ> maybeToList y
[] -- always get a value, it just happens to be an empty list if we pass in Nothing
listToMaybe
goes the other way:
listToMaybe :: [a] -> Maybe a
λ> listToMaybe [2]
Just 2
λ> listToMaybe $ maybeToList x
Just 40
λ> listToMaybe $ maybeToList y
Nothing
λ> listToMaybe [20,40,60] -- listToMaybe will drop all but the first element
Just 20
λ> listToMaybe "foo"
Just 'f'
catMaybes
takes a list of Maybe
s and returns a list of the values
for any Just
value, but drops out your Nothing
s:
catMaybes :: [Maybe a] -> [a]
λ> catMaybes [x, y, (Just 21), Nothing, Just(22)]
[40,21,22]
mapMaybe
lets you apply a function that accepts and returns a
Maybe
to a list of Maybe
s, but returns a list of the values (and
skips the Nothing
s):
mapMaybe :: (a -> Maybe b) -> [a] -> [b]
λ> let f x = (+2) <$> x -- or fmap (+2) x
λ> f (Just 30)
Just 32
λ> mapMaybe f [(Just 20), Nothing, x]
[22,42]
Erlang Function Options
2013-11-25
In working on a small Erlang project I found myself writing a few
functions that take one or more required parameters and then possibly
a few optional ones. A common idiom in Erlang is take a
list of tuples and/or atoms for optional parameters. For example,
gen_tcp:connect
has a signature like this: gen_tcp:connect(Host, Port, Options)
, where Options can be all kinds of stuff, such as:
{ok, Socket} = gen_tcp:connect("localhost", 12345, [binary, {packet, 0}, {active, false}]).
In this case, the host and port are required, and the third is a list
of various kinds of meta data, in various forms. binary
is a single
atom; packet
takes an integer, and according to the
docs (which
are actually part of inet:setopts) the value can be either 0 or raw
, or
1
, 2
or 4
. The value of active
can be true
, false
or once
.
I'm using a function to build up a record that will then be passed on to another function for further processing. The user should only have to pass in one required parameter, and then a list of zero or more optional ones, so I want the call would look something like this:
{ok, Result} = my_func(Arg2, Arg2, Options).
where Options
looks like [foo, {bar, 20}, {baz, "hello"}]
.
In this case no options are required, so I also added in a 1 parameter version of the function:
Pattern matching and recursion make handling this kind of thing pretty clean. If I need to add an option or change how one is handled, its clear where to do it and its easy to test those changes.
Follow up: http-streams and aeson
2013-06-05
In my post there were (at least) a few things that could be done better. I got some great feedback from the smart people on Hacker News so I've made a couple changes to the final source.
The biggest issue is that I was completely losing the benefit of streaming by reading the entire response body in and then parsing it. I knew this but as I mentioned I hadn't yet figured out how to handle it correctly. Fortunately someone else does, and with their suggestions (and a bit more Googling) I figured it out. It could probably still be cleaned up a little but for these purposes it does the trick.
On line 90 there is a new function called
parseJSONFromStream
. Calling this on the stream from
receiveResponse
(and adding the type hint for a
Result
wrapped around a Feed
) gives us a
similar situation as we had with Maybe Feed
, so the main
function does essentially the same thing still. With a little more time
we could probably make this cleaner in fetchQuakes
by
giving the handler more to rather than calling it in a lambda.
And speaking of fetchQuakes
, using http-streams'
withConnection
cleans it up a little bit by
automatically handling closing of the connection. It saves a couple
lines of code and makes the function less cluttered.
Thanks again for all of the feedback!
Haskell: http-streams and aeson
2013-05-28
I've been learning Haskell off and on for the past few months. It's pretty awesome, but definitely a challenging language to learn. I've had exposure and experience with functional programming for a couple years now (first with Scala, then/currently with Clojure, and some Erlang), and while those concepts are fairly well cemented in my brain, Haskell has a bunch of other stuff that make it a challenge. That's not to say this is a bad thing; in my limited experience, I've been finding that some of what initially seems complex turns more towards elegance once I'm comfortable with it, and Haskell rewards your efforts in many areas by being a really nice way to get things done. I figured I'd write up something here based on a recent challenge I faced and how with a bit of Googling and playing around I managed to figure it out and move on.
I'll usually try to bite off a small project in a new language once I feel reasonably comfortable with the basics. In the past I've found that there is one type of small project that makes a decent first try: RESTful API clients. There are a few reasons why:
- Usually only requires a couple of external dependencies, if any (HTTP and JSON/XML parsing)
- Easy to incrementally build and test (i.e. only implement a single API call initially, then go back and fill in)
- Lots of APIs and variation to choose from
- Possibly useful to you and/or someone else out there
For a simple walkthrough, I've chosen the USGS Geojson feeds that contain recent earthquakes broken down by recency and size. It's more of a "feed" than an API, but it works well because it requires no signup or authentication, and just lets you start pulling JSON off the web to see what happens. The list of feeds can be found here: http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php
Dealing with HTTP
There are a few options for HTTP client libraries, but I've chosen http-streams. It may or may not be the best/right choice for this application ... but I wanted to see how it worked. Getting something running with the basics is super easy. http-streams has some convenient wrappers for basic get requests:
We are all pretty used to simplified HTTP stuff these days, and
Haskell is no exception here. We hand get
a full URL and
some handler function that will be called with the
Response
and an InputStream
. To make things
even easier when we want to see what the body looks like, the built in
debugHandler
will dump out the response header and body
to stdout for us to peruse. If we just wanted the body alone on
standard out, we can replace the above call to
debugHandler
with (\p i -> Streams.connect i stdout)
.This just connects the body stream to standard out and
ignores the response headers.
If you hate trying to read unformatted JSON, you can install aeson-pretty which has a command line tool to pretty-print JSON:
HTTPStreamSimple | aeson-pretty
Or with python:
HTTPStreamSimple | python -m json.tool
If we want to start thinking about building a library around this, though, we need to change a couple of things. First, we probably want a little more control over the request so we can abstract it away from just a full URL every time, and second, we need to get the body into some intermediate state so we can parse it and return something more useful. This next snippet does just that, with the same result as our code above.
Now we are openning a connection to the host on port 80, building a
request that contains the request type, the path, and saying what we
expect in the response with setAccept
"application/json"
. If we were sending a POST request or needed
to send some other arbitrary header information, now would be the
right time for that as well. Next, we send the request with an emtpy
body (since it's a GET). We can now receive the response, and things
start to get familiar again as the receiveResponse
call's
second argument is the same type as we used above with
get
. The difference, however, is that we are using the
built in concatHandler
which will give us access to the
body rather than just dump it out to the terminal. For now
that is all we are doing with S.putStr x
. Finally we close the connection.
Later on we might handle the response and keep that code clear of the actual protocol layer. In a real project we might have a few layers of abstraction, but for this post we'll keep it fairly simple and just take advantage of having an easy point of entry with the handler. Now we need to hook up some parsing action with Aeson. This is actually the point at which I had some trouble, and thus the inspiration for this post, but before I explain that, here is a quick look at using Aeson so we can see what we are dealing with API-wise.
Parsing JSON
Looking at the Aeson project's examples, you'll see that you really have four options for dealing with JSON data to/from Haskell: the standard approach, a Template Haskell option, and two generic options. I'm too green to suggest the "right" one, but so far I'm a fan of using the one that requires the least amount of code up front, and then slowly migrating to the one that provides the clearest implementation in the end, so we'll start off making it easy on ourselves and use one of the generic methods and then ditch it for the standard approach which will require more work on our part but will also allow us to have more control of our types. Let's take a look at something similar to those examples but modified for our earthquakes. For now I'll just put in a few fields of our main types to keep it simple. If Aeson sees stuff that isn't in our type definition it'll just skip it (though if we have it and it's missing we'll get a parse error, more on that later). We'll also leave out the ToJSON stuff since we don't have a need to convert back to JSON in this case:
This is pretty basic. We define four main types: MetaData
, Properties
,
Feature
and Feed
. Working from the bottom up, Feed is our top level
container. Notice that it consists of a chunk of metadata and then a
list of (zero or more) Features with [Feature]
. A Feature
itself contains a properties slot and (for now) just the Feature
ID. Similarly we've left off most of the Properties items and just
added the detail String and the mag (magnitude) Double. Finall we have
a few slots in our MetaData type.
If we keep heading up to the top of the file, we see that we've imported the GHC.Generics which lets us get away with not having to tell Aeson how to parse our JSON (coupled with the DeriveGeneric extension at the top of the file), and we're also importing Data.ByteString.Lazy.Char8 because Aeson actually works on ByteStrings but we get away with using String by also including the OverloadedStrings extension.
To complete our tour of this file, jump back down to main. We call
decode on some inline JSON and tell it we want a Maybe Feed. This is
handy: if parsing fails for any reason, we'll get back
Nothing
, otherwise we should have JustFeed
.
Using a case statement we can pull out our result and
print it out. We derived Show
in all of our types so we
get a nice representation of the type on the terminal for inspection.
This is now the point at which I got a bit stuck. In theory we have all the parts we should need to fetch the JSON and parse it into our data structure(s), but there was a problem: Aeson expects a lazy ByteString to decode, but what we are getting from http-streams (or from the underlying io-streams, I guess) is strict. Here is what I did:
In jsonHandler, I'm using Streams.toList
which should
give me the whole body as a list of chunks. This ensures that we get
all the parts from a large body so we can correctly parse the
JSON. fromChunks
let's us take a list of strict ByteStrings
and converts
it to a single lazy ByteString, which we can hand directly to Aeson's
decode
to parse our JSON. Rad.
I'm hoping this is the right way to go. After searching a bit, I found out that I may not be the only one with this issue, and in fact, there is an issue logged against Aeson with a workaround. This would still require that the body be converted to a single strict ByteString, however, so I'm not sure if it gains much in the way of convenience or performance.
For the final revision, I've done a few things. First, as mentioned earlier, there are a few options with Aeson. I started out with the simple Generics method that let us write the least amount of code to start parsing some JSON into our types. From what I can tell, though, this backs us into a corner with our type's field names; if they don't match the JSON keys, we don't get them. The other problem is that we don't have the ability to transform types, for instance, we might want to parse a date into a specific date type rather than just using a String. Third, if you happen to be grabbing JSON with optional fields, going to the standard approach will let you handle these correctly. Basically you have the typical tradeoffs: more code to write, debug and maintain in exchange for more flexibility and tighter types.
The next things to notice: I pulled the HTTP call out of main and wrapped it nicely in its own function, and added some parameters to build the URL. Now we have a more DSL feel, passing in our required magnitude and timeframe. The result is still a list of feeds, and after making sure we have some we do some contrived stats on the magnitudes.
Haskell is pretty nice. Feedback welcome from newbies and seasoned Haskellers alike.
(Edit: after posting this I found a similar example here by Fujimura Daisuke)
Contact
- Email: joshrotenberg@gmail.com
- LinkedInt: https://www.linkedin.com/in/joshrotenberg
- GitHub: https://github.com/joshrotenberg
- Personal site: https://joshrotenberg.com (you are here)