Skip to Main Content

Reproducible Research Methods

Definitions of Tools and Their Related Concepts

Anaconda:   Anaconda Navigator is a GUI that provides easy access to Python distributions. Typical installations will include Jupyter Notebook, JupyterLab, a Powershell terminal, and Spyder (a Python IDE). It provides a nice interface to install and upgrade different python environments and packages. While it also supports R and RStudio, if you program in R, we recommend a installing R/RStudio separately.

Docker:  Docker is an open software platform used to create containers, objects that include everything needed to run an application regardless of the computer's operating system.  In terms of reproducible research, a container such as one made with Docker can be used to document and share the computational environment used in a research project.  

Git:      Version control software.  Git is “free and open source distributed version control system.”  It is commonly the system used to power repositories such as GitHub, Bitbucket, and GitLab. A competing system for Git is Mercurial.

Hyper-V : A native hypervisor, or virtual machine monitor in Windows.  It creates and runs virtual machines.

IDE:    IDE stands for Integrated Development Environment. It is a software application that allows programmers to develop and debug their software. Good IDEs will provide comprehensive utilities supporting development in a particular language. Most IDEs are specific to a particular programming language or development environment, such as Spyder for Python, CLion for C++, and Xcode for Apple products. Some, such as Visual Studio, will support a suite of (often related) languages.

Java:      A general purpose computer programming language that support object-oriented development. It is often used for client-server web applications.  Java is a mid-level, compiled programming language. 

Kernel:  "The most basic level or core of an operating system of a computer, responsible for resource allocation, file management, and security."  It helps software and hardware communicate.  In terms of programming, when you run a Python kernel or R kernel, the kernel will contain all the information needed to translate the programming language to computer processes. In terms of containers or virtual machines, each container will have its own kernel so it can run as an independent environment, even though it's running inside of the operating system on a computer.

Make:  A build automation tool that automatically builds executable programs and libraries from source code. Also used to run analysis scripts on raw data files to get data files that summarize the raw data; run visualization scripts on data files to produce plots; and to parse and combine text files and plots to create papers. In this manner, Make can build data files, plots, and papers; and update existing files.

MATLAB:  MATrix LABoratory. A numerical computing environment used widely to support computation and research.  It supports all levels of computation from creation and implementation of algorithms within script (*.m) files to the implementation of modeling within a parallel computing environment.  It excels in the area of matrix manipulation and mathematics.  MATLAB is also the name of the associated proprietary high-level programming language.  MATLAB is a commercial product.  UCR currently has a site license of MATLAB, which allows all campus users to install and use MATLAB on an annual basis.

Programming language, compiled:  A compiled programming language must first be compiled into a file (often *.exe, but can be anything) that contains the machine-language instructions to be executed.

Programming language, interpreted:  An interpreted programming language can be executed without needing to be compiled into a machine-language instruction file.

Programming language, level:  The level of a programming language indicates its nearness to the  machine-language that runs a computer. Low-level languages are more difficult to learn and implement, but allow a greater range of control over specific computer functions; by contrast, high-level languages are easy to learn and implement, but are restricted in their use. Generally speaking, most researchers and students only need to learn high-level languages.

Python:  Python is a high-level, general purpose, interpreted computer programming language. 

R/RStudio:  R is a high-level, interpreted programming language created to support statistical computing and graphics. RStudio is a free, open-source, integrated development environment (IDE) for R. 

Spreadsheets:  A type of computer tool (application and file) which displays data in tabular format. Most applications which support spreadsheets allow computation on tables of data.

SQL:    Structured Query Language. A domain specific language which supports managing and manipulating data held in relational database systems.

Unix (bash): Bourne Again SHell.  A command line processing environment native to the Unix, Linux, and Mac operating systems, used to navigate directories and execute commands. Examples of simple commands are listing the contents of a directory, and printing the directory the user is currently in.  Intermediate commands will provide summary information of files and directories, and allow commands to be “piped” together, with the output redirected into files.  Advanced usage of Bash includes running command sequences from script files, using system variables, and modifying file/directory protection. An advantage to Bash is novices may quickly become experts as most advanced uses are complex implementations of simple and intermediate commands.

Weka: Java-based Data Mining software.

WSL, Windows Subsystem for Linux: "A compatibility layer for running Linux binary executables natively on Windows 10."  To translate that, it allows you to do things like run Linux/UNIX commands from the terminal, and run some Linux applications, all on your Windows 10 machine. It was created to function as a lightweight virtual machine environment.  WSL emulates a Linux environment.  WSL 2 uses a real Linux kernel, and so is faster and more compatible with Linux applications. 

Virtual Machine : An emulation of a computer system that behaves like that operating system. It runs as a container/application within the operating system of a computer, essentially creating a computer within a computer.