My first assignment in the the LearnElixir curriculum is to create a giphy scraper with the following requirements:
- use Giphy’s search endpoint to return 25 results
- the user must be able to load my project in
iex
and callGiphyScraper.search(query)
to obtain the results - the results must be in the following format:
[
%GiphyScraper.GiphyImage{
id: "some_id",
url: "url_to_gif",
username: "username of creator",
title: "SomeGif"
},
%GiphyScraper.GiphyImage{
id: "some_other_id",
url: "url_to_gif_2",
username: "username of creator",
title: "MyGif"
}
]
Here’s how I’m thinking about breaking the problem down
To start, I’m going to use the call to GiphyScraper.search(query)
as my API - this is the entrypoint for a user wanting to obtain data. Given that even with this task
there’ll be a fair amount of data transformation, I’ll create a primary module that I can delegate to - this module is where the bulk of my functions will live, including
requests to the Giphy endpoint.
(This approach will make it easier to add additional ways to interact with the project down the line. When I add a CLI interface, all I have to do is pass the input query from the CLI to the API.)
Additionally, I’ll want a GiphyImage
struct that I can parse each giphy result into in order to return the required list of structs.
Lastly, I’ll want to install and use a couple of libraries to do my request and JSON
handling. LearnElixir recommends using the finch
library; I decided to go with HTTPoison
instead since I’m already familiar with it, but this article outlines some of the benefits of finch
, and I’d like to explore it as an alternative once I get everything working.
Let’s get started!
The core logic
The bulk of the functions will live in a module that, in theory, won’t be necessary for an end-user to interact with. When it’s working as intended, I’ll delegate the GiphyScraper.search
call to this module’s initial function. I like to get stuff like this out of the way at the start of the project just to make sure that everything is working as intended:
defmodule GiphyScraper do
alias GiphyScraper.Fetcher
defdelegate search(query), to: Fetcher, as: :get_gifs_for_query
end
And my core logic will live a separate module:
defmodule GiphyScraper.Fetcher do
def get_gifs_for_query(query) do
IO.puts "you passed in the following query: #{query}"
end
end
Sure enough, running this in iex
results in the expected output:
iex(3)> GiphyScraper.search "hello"
you passed in the following query: hello
:ok
Fast forwarding a bit, and I have my working core module, GiphyScraper.Fetcher
, with a few relatively short functions that look as follows:
defmodule GiphyScraper.Fetcher do
alias GiphyScraper.GiphyImage
def get_gifs_for_query(query, limit \\ 25) do
query
|> get_giphy_request_url(limit)
|> make_request_and_return_response_data
|> Enum.map(&parse_response_data_into_image_data/1)
end
def get_giphy_request_url(query, limit) do
api_key = get_api_key()
"api.giphy.com/v1/gifs/search?api_key=#{api_key}&q=#{query}&limit=#{limit}"
end
def make_request_and_return_response_data(url) do
HTTPoison.start
{:ok, response} = HTTPoison.get(url)
body = response.body |> JSON.decode!
body["data"]
end
def parse_response_data_into_image_data(data) do
title = get_in(data, ["title"])
url = get_in(data, ["url"])
username = get_in(data, ["user", "username"])
id = get_in(data, ["id"])
%GiphyImage{
title: title,
url: url,
username: username,
id: id
}
end
defp get_api_key, do: System.get_env("GIPHY_API_KEY")
end
And indeed, running GiphyScraper.search("cheeseburger")
from within iex
produces the following results (truncated for brevity’s sake):
[
%GiphyScraper.GiphyImage{
id: "3ohs4h1Dt995D5iGA0",
url: "https://giphy.com/gifs/scoobydoo-cartoon-scooby-doo-3ohs4h1Dt995D5iGA0",
username: "scoobydoo",
title: "Hungry Cartoon GIF by Scooby-Doo"
},
%GiphyScraper.GiphyImage{
id: "xTiTnwj1LUAw0RAfiU",
url: "https://giphy.com/gifs/matthewjocelyn-dancing-dance-burger-xTiTnwj1LUAw0RAfiU",
username: "matthewjocelyn",
title: "Dance Dancing GIF by matthewjocelyn"
},
...
For the top-level function (get_gifs_for_query
), I chose to use pipes in order to make clear the transformation of the data as it was received and passed on. I decided to make a standalone function for retrieving a formatted url to send to the Giphy endpoint. This was in part due to
the need to retrieve an api_key
, as well as the optional limit
parameter. I set the default to 25, which is already what the giphy endpoint defaults to, but I thought it’d be useful to include in case the end user wants to modify it.
When parsing the response data into structs, I decided to use Kernel.get_in
from the start. To me, it just looks cleaner than a bunch of subsequent brackets, and it helps to set the expectation in the code of nested maps when decoding JSON.
Some closing thoughts
This exercise was a great way to get used to parsing data from an endpoint. I ran into a few errors (ProtocolError
, ArgumentError
, etc) when
trying to parse the response received by both the HTTPoison
and the finch
clients, as well as with Jason
and JSON
; it took some trial and error before I
remembered to prase on the {:ok, _}
pattern, and to realize that I was dealing with maps and not strings. I’m sure I’ll get used to it.
Additionally, I found it fun to explore the different ways of grouping certain functions, as well as deciding when to use default parameters, etc. These are not problems
new to Elixir, but they were made more interesting by the different options that Elixir DOES present. Should I try to access nested keys directly, or use get_in
function? Should I use an anomyous function or a named function to pass in as a second argument to Enum.map
? Does the procedural break down of the data in make_request_and_return_response_data
feel too “Python-y”? Should I find a way to re-arrange that data transformation so it can be piped through to the end?
It was a fun exercise and exposed me to different parts of working with Elixir code on something relatively nontrivial. You can find the full project in my Github repo
Up next
I’ll add a CLI layer to allow for query input via the command line. I’ll also see if I can get finch
working as expected. Lastly, I’d like to add some tests, including finding a way to mock data for specific parts of the pipeline. Stay tuned for Part 2… (coming soon)
Update: You can find the next post here, outlining how to add a CLI interface for querying