The first software package to create visualizations that we’re discussing in this chapter is Gnuplot. Gnuplot has been around since 1986. Despite being rather old, its visualization capabilities are quite extensive. As such, it’s impossible to do it justice. There are other good resources available, including Gnuplot in Action by Janert ().
To demonstrate the flexibility (and its archaic notation), consider Example 7.2, which is copied from the Gnuplot website ().
Example 7.2 (Creating a histogram using Gnuplot)
Please note that this is trimmed to 80 characters wide. The above script generates the following image:
Figure 7.1: Immigration Plot by Gnuplot
Gnuplot is different from most command-line tools we’ve been using for two reasons. First, it uses a script instead of command-line arguments. Second, the output is always written to a file and not printed to standard output.
One great advantage of Gnuplot being around for so long, and the main reason we’ve included it in this book, is that it’s able to produce visualizations for the command line. That is, it’s able to print its output to the terminal without the need for a graphical user interface (GUI). Even then, you would need to set up a script.
Luckily, there is a command-line tool called (Kogan 2014), which can help us with setting up a script for Gnuplot. feedgnuplot
is entirely configurable through command-line arguments. Plus, it reads from standard input. After we have introduced ggplot2
, we’re going to create a few visualizations using feedgnuplot
.
One great feature of feedgnuplot
that we would like to mention here, is that it allows you to plot streaming data. The following is a snapshot of a continuously updated plot based on random input data:
$ while true; do echo $RANDOM; done | sample -d 10 | feedgnuplot --stream \
> --terminal 'dumb 80,25' --lines --xlen 10
30000 ++-----+------------+-------------+-------------+------------+-----++
| + * + + + |
| : ** : ******* : *
25000 ++.................*.*..........................*.....*............+*
| : *: * : *: * : *|
| : *: * : *: * : *|
| : * : * : * : * : * |
20000 ++................*....*......................*.........*.........*++
| : * : * : * : * : * |
| : * : * : * : * : * |
15000 ++....**.........*.......*..................*............*.......*.++
| **** :* * : * : * : * : * |
** :* * : * **** * : * : * |
10000 ++.......*......*.........*....**....*.....*..............*.....*..++
| : * * : * ** : * * : * : * |
| : * * : ** : ** * : * : * |
| : * * : : * : * : * |
5000 ++..........*..*.........................*..................*.*....++
| : * * : : : *:* |
| + ** + + + * |
0 ++-----+------*-----+-------------+-------------+------------*-----++
2350 2352 2354 2356 2358
7.4.2 Introducing ggplot2
A more modern software package for creating visualizations is ggplot, which is an implementation of the grammar of graphics in R (Wickham 2009).
Thanks to the grammar of graphics and using sensible defaults, ggplot2
commands tend to be very short and expressive. When used through Rio
, this is a very convenient way of creating visualizations from the command line.
To demonstrate it’s expressiveness, we’ll recreate the histogram plot generated above by gnuplot, with the help of Rio
. Because Rio
expects the data set to be comma-delimited, and because ggplot2
expects the data in long format, we first need to scrub and transform the data a little bit:
$ < data/immigration.dat sed -re '/^#/d;s/\t/,/g;s/,-,/,0,/g;s/Region/'\
> 'Period/' | tee data/immigration.csv | head | cut -c1-80
Period,Austria,Hungary,Belgium,Czechoslovakia,Denmark,France,Germany,Greece,Irel
1891-1900,234081,181288,18167,0,50231,30770,505152,15979,388416,651893,26758,950
1901-1910,668209,808511,41635,0,65285,73379,341498,167519,339065,2045877,48262,1
1911-1920,453649,442693,33746,3426,41983,61897,143945,184201,146181,1109524,4371
1921-1930,32868,30680,15846,102194,32430,49610,412202,51084,211234,455315,26948,
1931-1940,3563,7861,4817,14393,2559,12623,144058,9119,10973,68028,7150,4740,3960
1941-1950,24860,3469,12189,8347,5393,38809,226578,8973,19789,57661,14860,10100,1
1951-1960,67106,36637,18575,918,10984,51121,477765,47608,43362,185491,52277,2293
1961-1970,20621,5401,9192,3273,9201,45237,190796,85969,32966,214111,30606,15484,
Remove lines that start with #.
Convert tabs to commas.
Change dashes (missing values) into zero’s.
Change the feature name Region into Period.
We then select only the columns that matter using csvcut
and subsequently convert the data from a wide format to a long one using the Rio
and the melt
function which part of the R package reshape2
:
$ < data/immigration.csv csvcut -c Period,Denmark,Netherlands,Norway,\
> Sweden | Rio -re 'melt(df, id="Period", variable.name="Country", '\
|------------+-------------+--------|
| Period | Country | Count |
|------------+-------------+--------|
| 1891-1900 | Denmark | 50231 |
| 1901-1910 | Denmark | 65285 |
| 1921-1930 | Denmark | 32430 |
| 1931-1940 | Denmark | 2559 |
| 1941-1950 | Denmark | 5393 |
| 1951-1960 | Denmark | 10984 |
| 1961-1970 | Denmark | 9201 |
| 1891-1900 | Netherlands | 26758 |
|------------+-------------+--------|
Now, we can use Rio
again, but then with an expression that builds up a ggplot2
visualization:
$ < data/immigration-long.csv Rio -ge 'g + geom_bar(aes(Country, Count,'\
> ' fill=Period), stat="identity") + scale_fill_brewer(palette="Set1") '\
> '+ labs(x="Country of origin", y="Immigration by decade", title='\
> '"Immigration from Northern Europe\n(columstacked histogram)")' | display
Figure 7.2: Immigration plot by Rio and ggplot2
The -g
command-line argument indicates that Rio should load the ggplot2
package. The output is an image in PNG format. You can either view the PNG image via display
, which is part of ImageMagick (LLC ) or you can redirect the output to a PNG file. If you’re on a remote terminal then you probably won’t be able to see any graphics. A workaround for this is to start a webserver from a particular directory:
Make sure that you have access to the port (8000 in this case). If you save the PNG image to the directory from which the webserver was launched, then you can access the image from your browser at http://localhost:8000/file.png.
7.4.3 Histograms
Using Rio
:
$ < data/tips.csv Rio -ge 'g+geom_histogram(aes(bill))' | display
Figure 7.3: Histogram
Using feedgnuplot
:
< data/tips.csv csvcut -c bill | feedgnuplot --terminal 'dumb 80,25' \
--histogram 0 --with boxes --ymin 0 --binwidth 1.5 --unset grid --exit
25 ++----+------+-----+--***-+-----+------+-----+------+-----+------+----++
+ + + +*** * + + + + + + + +
| * * * |
| *** * * * |
20 ++ * * * * * ++
| **** * * * * |
| * ** *** * * *** |
| * ** * * * * * * |
15 ++ * ** * * * * * * ++
| * ** * * * * * * |
| * ** * * * * * * |
| * ** * * * * * * *** |
10 ++ * ** * * * *** *** * ++
| * ** * * * * * * * * |
| *** ** * * * * * * * ***** *** |
| * * ** * * * * * * * * * *** * |
5 ++ *** * ** * * * * * * * * * * * * *** ++
| * * * ** * * * * * * * * * * * * *** * |
| * * * ** * * * * * * * * * * * *** * ******** *** *** |
+ ***+*** * * ** *+* * * * * * * * * *+* * *+** * *+* ***+* * * *** +
0 ++-***+***********************************************-*****-***-***--++
0 5 10 15 20 25 30 35 40 45 50 55
$ < data/tips.csv Rio -ge 'g+geom_bar(aes(factor(size)))' | display
Figure 7.4: Bar Plot
Using feedgnuplot
:
$ < data/tips.csv | csvcut -c size | header -d | feedgnuplot --terminal \
> 'dumb 80,25' --histogram 0 --with boxes --unset grid --exit
+ + * + * + + + + +
140 ++ * * ++
| * * |
120 ++ * * ++
| * * |
100 ++ * * ++
| * * |
| * * |
80 ++ * * ++
| * * |
60 ++ * * ++
| * * |
| * * |
40 ++ * ********************* ++
| * * * * |
20 ++ * * * * ++
| * * * * |
+ *********** + * + * + ********************* +
0 ++---*************************************************************---++
0 1 2 3 4 5 6 7
7.4.5 Density Plots
Using Rio
:
Figure 7.5: Density Plot
Since feedgnuplot
cannot generate density plots, it’s best to just generate a histogram.
7.4.6 Box Plots
Using Rio
:
$ < data/tips.csv Rio -ge 'g+geom_boxplot(aes(time, bill))' | display
Figure 7.6: Box Plot
Drawing a box plot is unfortunately not possible with feedgnuplot
.
Using Rio
:
$ < data/tips.csv Rio -ge 'g+geom_point(aes(bill, tip, color=time))' | display
Figure 7.7: Scatter Plot
Using feedgnuplot
:
< data/tips.csv csvcut -c bill,tip | tr , ' ' | header -d | feedgnuplot \
--terminal 'dumb 80,25' --points --domain --unset grid --exit --style 'pt' '14'
10 ++----+------+-----+------+-----+------+-----+------+-----+------+A---++
+ + + + + + + + + + + +
9 ++ A ++
| |
8 ++ ++
| A |
| |
7 ++ A A ++
| A A |
6 ++ A A A ++
| A A |
5 ++ A A A A A AA A AA A A A ++
| A A A A |
4 ++ A A AAAA AAA A A A A A ++
| A AAAAA AAA AA A A |
| A AAAAAAA AA A A AA A AA |
3 ++ A AAAAAAAAAAA A A AA AA A ++
| AAAAAAA AA A A A A A |
2 ++ AA AAAAAAAAA A A A AA A A A ++
+ + AAAAAAAA +A AA+ + A + + + + + +
1 ++--A-+A-A---+--AA-+--A---+-----+------+--A--+------+-----+------+----++
0 5 10 15 20 25 30 35 40 45 50 55
7.4.8 Line Graphs
$ < data/immigration-long.csv Rio -ge 'g+geom_line(aes(x=Period, '\
> 'element_text(angle = -45, hjust = 0))' | display
Figure 7.8: Line Graph