Data Science at the Command Line

    June 10, 2019

    Welcome

    This is the website for Data Science at the Command Line, published by O’Reilly October 2014 First Edition. This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

    To get you started—whether you’re on Windows, macOS, or Linux—author Jeroen Janssens has developed a packed with over 80 command-line tools.

    Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

    • Obtain data from websites, APIs, databases, and spreadsheets
    • Perform scrub operations on text, CSV, HTML/XML, and JSON
    • Manage your data science workflow
    • Create reusable command-line tools from one-liners and existing Python or R code
    • Parallelize and distribute data-intensive pipelines

    This book explains how to integrate common data science tasks into a coherent workflow. It’s not just about tactics for breaking down problems, it’s also about strategies for assembling the pieces of the solution.

    Consultant in applied mathematics, statistics, and technical computing

    If you find this content useful, please consider supporting the work by either:

    Did you know that the author gives in-company training about this topic and other topics such as R and Python? If you and your colleagues would like to learn from Jeroen in person, please contact Data Science Workshops B.V. for more information.