The first software package to create visualizations that we’re discussing in this chapter is Gnuplot. Gnuplot has been around since 1986. Despite being rather old, its visualization capabilities are quite extensive. As such, it’s impossible to do it justice. There are other good resources available, including Gnuplot in Action by Janert ().

    To demonstrate the flexibility (and its archaic notation), consider Example 7.2, which is copied from the Gnuplot website ().

    Example 7.2 (Creating a histogram using Gnuplot)

    Please note that this is trimmed to 80 characters wide. The above script generates the following image:

    Figure 7.1: Immigration Plot by Gnuplot

    Gnuplot is different from most command-line tools we’ve been using for two reasons. First, it uses a script instead of command-line arguments. Second, the output is always written to a file and not printed to standard output.

    One great advantage of Gnuplot being around for so long, and the main reason we’ve included it in this book, is that it’s able to produce visualizations for the command line. That is, it’s able to print its output to the terminal without the need for a graphical user interface (GUI). Even then, you would need to set up a script.

    Luckily, there is a command-line tool called (Kogan 2014), which can help us with setting up a script for Gnuplot. feedgnuplot is entirely configurable through command-line arguments. Plus, it reads from standard input. After we have introduced ggplot2, we’re going to create a few visualizations using feedgnuplot.

    One great feature of feedgnuplot that we would like to mention here, is that it allows you to plot streaming data. The following is a snapshot of a continuously updated plot based on random input data:

    1. $ while true; do echo $RANDOM; done | sample -d 10 | feedgnuplot --stream \
    2. > --terminal 'dumb 80,25' --lines --xlen 10
    3. 30000 ++-----+------------+-------------+-------------+------------+-----++
    4. | + * + + + |
    5. | : ** : ******* : *
    6. 25000 ++.................*.*..........................*.....*............+*
    7. | : *: * : *: * : *|
    8. | : *: * : *: * : *|
    9. | : * : * : * : * : * |
    10. 20000 ++................*....*......................*.........*.........*++
    11. | : * : * : * : * : * |
    12. | : * : * : * : * : * |
    13. 15000 ++....**.........*.......*..................*............*.......*.++
    14. | **** :* * : * : * : * : * |
    15. ** :* * : * **** * : * : * |
    16. 10000 ++.......*......*.........*....**....*.....*..............*.....*..++
    17. | : * * : * ** : * * : * : * |
    18. | : * * : ** : ** * : * : * |
    19. | : * * : : * : * : * |
    20. 5000 ++..........*..*.........................*..................*.*....++
    21. | : * * : : : *:* |
    22. | + ** + + + * |
    23. 0 ++-----+------*-----+-------------+-------------+------------*-----++
    24. 2350 2352 2354 2356 2358

    7.4.2 Introducing ggplot2

    A more modern software package for creating visualizations is ggplot, which is an implementation of the grammar of graphics in R (Wickham 2009).

    Thanks to the grammar of graphics and using sensible defaults, ggplot2 commands tend to be very short and expressive. When used through Rio, this is a very convenient way of creating visualizations from the command line.

    To demonstrate it’s expressiveness, we’ll recreate the histogram plot generated above by gnuplot, with the help of Rio. Because Rio expects the data set to be comma-delimited, and because ggplot2 expects the data in long format, we first need to scrub and transform the data a little bit:

    1. $ < data/immigration.dat sed -re '/^#/d;s/\t/,/g;s/,-,/,0,/g;s/Region/'\
    2. > 'Period/' | tee data/immigration.csv | head | cut -c1-80
    3. Period,Austria,Hungary,Belgium,Czechoslovakia,Denmark,France,Germany,Greece,Irel
    4. 1891-1900,234081,181288,18167,0,50231,30770,505152,15979,388416,651893,26758,950
    5. 1901-1910,668209,808511,41635,0,65285,73379,341498,167519,339065,2045877,48262,1
    6. 1911-1920,453649,442693,33746,3426,41983,61897,143945,184201,146181,1109524,4371
    7. 1921-1930,32868,30680,15846,102194,32430,49610,412202,51084,211234,455315,26948,
    8. 1931-1940,3563,7861,4817,14393,2559,12623,144058,9119,10973,68028,7150,4740,3960
    9. 1941-1950,24860,3469,12189,8347,5393,38809,226578,8973,19789,57661,14860,10100,1
    10. 1951-1960,67106,36637,18575,918,10984,51121,477765,47608,43362,185491,52277,2293
    11. 1961-1970,20621,5401,9192,3273,9201,45237,190796,85969,32966,214111,30606,15484,
    • Remove lines that start with #.

    • Convert tabs to commas.

    • Change dashes (missing values) into zero’s.

    • Change the feature name Region into Period.

    We then select only the columns that matter using csvcut and subsequently convert the data from a wide format to a long one using the Rio and the melt function which part of the R package reshape2:

    1. $ < data/immigration.csv csvcut -c Period,Denmark,Netherlands,Norway,\
    2. > Sweden | Rio -re 'melt(df, id="Period", variable.name="Country", '\
    3. |------------+-------------+--------|
    4. | Period | Country | Count |
    5. |------------+-------------+--------|
    6. | 1891-1900 | Denmark | 50231 |
    7. | 1901-1910 | Denmark | 65285 |
    8. | 1921-1930 | Denmark | 32430 |
    9. | 1931-1940 | Denmark | 2559 |
    10. | 1941-1950 | Denmark | 5393 |
    11. | 1951-1960 | Denmark | 10984 |
    12. | 1961-1970 | Denmark | 9201 |
    13. | 1891-1900 | Netherlands | 26758 |
    14. |------------+-------------+--------|

    Now, we can use Rio again, but then with an expression that builds up a ggplot2 visualization:

    1. $ < data/immigration-long.csv Rio -ge 'g + geom_bar(aes(Country, Count,'\
    2. > ' fill=Period), stat="identity") + scale_fill_brewer(palette="Set1") '\
    3. > '+ labs(x="Country of origin", y="Immigration by decade", title='\
    4. > '"Immigration from Northern Europe\n(columstacked histogram)")' | display

    Immigration plot by Rio and ggplot2Figure 7.2: Immigration plot by Rio and ggplot2

    The -g command-line argument indicates that Rio should load the ggplot2 package. The output is an image in PNG format. You can either view the PNG image via display, which is part of ImageMagick (LLC ) or you can redirect the output to a PNG file. If you’re on a remote terminal then you probably won’t be able to see any graphics. A workaround for this is to start a webserver from a particular directory:

    Make sure that you have access to the port (8000 in this case). If you save the PNG image to the directory from which the webserver was launched, then you can access the image from your browser at http://localhost:8000/file.png.

    7.4.3 Histograms

    Using Rio:

    1. $ < data/tips.csv Rio -ge 'g+geom_histogram(aes(bill))' | display

    Figure 7.3: Histogram

    Using feedgnuplot:

    1. < data/tips.csv csvcut -c bill | feedgnuplot --terminal 'dumb 80,25' \
    2. --histogram 0 --with boxes --ymin 0 --binwidth 1.5 --unset grid --exit
    3. 25 ++----+------+-----+--***-+-----+------+-----+------+-----+------+----++
    4. + + + +*** * + + + + + + + +
    5. | * * * |
    6. | *** * * * |
    7. 20 ++ * * * * * ++
    8. | **** * * * * |
    9. | * ** *** * * *** |
    10. | * ** * * * * * * |
    11. 15 ++ * ** * * * * * * ++
    12. | * ** * * * * * * |
    13. | * ** * * * * * * |
    14. | * ** * * * * * * *** |
    15. 10 ++ * ** * * * *** *** * ++
    16. | * ** * * * * * * * * |
    17. | *** ** * * * * * * * ***** *** |
    18. | * * ** * * * * * * * * * *** * |
    19. 5 ++ *** * ** * * * * * * * * * * * * *** ++
    20. | * * * ** * * * * * * * * * * * * *** * |
    21. | * * * ** * * * * * * * * * * * *** * ******** *** *** |
    22. + ***+*** * * ** *+* * * * * * * * * *+* * *+** * *+* ***+* * * *** +
    23. 0 ++-***+***********************************************-*****-***-***--++
    24. 0 5 10 15 20 25 30 35 40 45 50 55
    1. $ < data/tips.csv Rio -ge 'g+geom_bar(aes(factor(size)))' | display

    Bar PlotFigure 7.4: Bar Plot

    Using feedgnuplot:

    1. $ < data/tips.csv | csvcut -c size | header -d | feedgnuplot --terminal \
    2. > 'dumb 80,25' --histogram 0 --with boxes --unset grid --exit
    3. + + * + * + + + + +
    4. 140 ++ * * ++
    5. | * * |
    6. 120 ++ * * ++
    7. | * * |
    8. 100 ++ * * ++
    9. | * * |
    10. | * * |
    11. 80 ++ * * ++
    12. | * * |
    13. 60 ++ * * ++
    14. | * * |
    15. | * * |
    16. 40 ++ * ********************* ++
    17. | * * * * |
    18. 20 ++ * * * * ++
    19. | * * * * |
    20. + *********** + * + * + ********************* +
    21. 0 ++---*************************************************************---++
    22. 0 1 2 3 4 5 6 7

    7.4.5 Density Plots

    Using Rio:

    Figure 7.5: Density Plot

    Since feedgnuplot cannot generate density plots, it’s best to just generate a histogram.

    7.4.6 Box Plots

    Using Rio:

    1. $ < data/tips.csv Rio -ge 'g+geom_boxplot(aes(time, bill))' | display

    Box PlotFigure 7.6: Box Plot

    Drawing a box plot is unfortunately not possible with feedgnuplot.

    Using Rio:

    1. $ < data/tips.csv Rio -ge 'g+geom_point(aes(bill, tip, color=time))' | display

    Figure 7.7: Scatter Plot

    Using feedgnuplot:

    1. < data/tips.csv csvcut -c bill,tip | tr , ' ' | header -d | feedgnuplot \
    2. --terminal 'dumb 80,25' --points --domain --unset grid --exit --style 'pt' '14'
    3. 10 ++----+------+-----+------+-----+------+-----+------+-----+------+A---++
    4. + + + + + + + + + + + +
    5. 9 ++ A ++
    6. | |
    7. 8 ++ ++
    8. | A |
    9. | |
    10. 7 ++ A A ++
    11. | A A |
    12. 6 ++ A A A ++
    13. | A A |
    14. 5 ++ A A A A A AA A AA A A A ++
    15. | A A A A |
    16. 4 ++ A A AAAA AAA A A A A A ++
    17. | A AAAAA AAA AA A A |
    18. | A AAAAAAA AA A A AA A AA |
    19. 3 ++ A AAAAAAAAAAA A A AA AA A ++
    20. | AAAAAAA AA A A A A A |
    21. 2 ++ AA AAAAAAAAA A A A AA A A A ++
    22. + + AAAAAAAA +A AA+ + A + + + + + +
    23. 1 ++--A-+A-A---+--AA-+--A---+-----+------+--A--+------+-----+------+----++
    24. 0 5 10 15 20 25 30 35 40 45 50 55

    7.4.8 Line Graphs

    1. $ < data/immigration-long.csv Rio -ge 'g+geom_line(aes(x=Period, '\
    2. > 'element_text(angle = -45, hjust = 0))' | display

    Line GraphFigure 7.8: Line Graph

    7.4.9 Summary