Chapter 9 Modeling Data

However, visualizations, on the one hand, are characterized by shapes, positions, and colors such that we can interpret them by looking at them. Models, on the other hand, are internally characterized by a bunch of numbers, which means that computers can use them, for example, to make predictions about a new data points. (We can still visualize models so that we can try to understand them and see how they are performing.)

These four algorithms come from the field of machine learning. As such, we’re going to change our vocabulary a bit. Let’s assume that we have a CSV file, also known as a data set. Each row, except for the header, is considered to be a data point. For simplicity we assume that each column that contains numerical values is an input feature. If a data point also contains a non-numerical field, such as the species column in the Iris data set, then that is known as the data point’s label.

This is by no means an introduction to machine learning. That implies that we must skim over many details. We strongly advise that you become familiar with an algorithm before applying it blindly to your data.