3. Interface and Data Managment

It’s almost time to start designing our class structure (the fun stuff)!

Today I’ll be covering some options for how to interact with your simulation.

While deciding how you want to choose which experiment to run or what parameters to use, there are two things to keep in mind: Data organization and ease of use.

First is data organization: You want your data to be easy to find and be able to recall exactly how you generated that data. This is important for reproducibility and will save you the headache of searching though gigabytes of data spread over hundreds of files like you would have to if you don’t have it organized (once again I speak from experience here). Unfortunately, there is no golden solution for folder organization. Each project has its own flow and evolution, so my biggest piece of advice is to reevaluate your organization system often. You get to choose what often means here. You could reevaluate your organization system every new experiment, every month or two, or whatever cadence makes sense to you. At the very least, you should take time to do reorganizing if you find yourself unable to quickly find what you need.

A good starting point for data organization might look like:

Code Version (for behavior changing updates)
- Experiment number
  - Parameter set
    - Run number

Make sure to include a text file in each experiment folder providing a blurb about what the experiment is, as well as a file in each parameter set that lays out the parameters.

This leads nicely into a saying I’ve become quite fond of: “Code is code and data is data and never the twain shall meet.” Your code is the logic of how to run your simulation or solve your problem, and data is the specifics of your problem. This is to say that you shouldn’t mix where your parameters live for an experiment with the logic that performs the experiment. It’s fine to have default values for parameters, but you can run into problems if you hard code parameters for experiments. It creates more work when you have decentralized parameters and have to flip through many files or lines of code to change them. If you hard code how big your neighbor grid is for one project and then use it for another you will run into problems.

For ease of use you want to think about how you will be running this program. From your desktop or laptop? From the cluster? Are there many options you want to change between runs? Odds are you will want at least one command line argument to switch between experiments. You could also include an argument that gives the path of a file to read some parameters from, or a parameter that tells the simulation which run number of a given parameter it’s on so it knows where to store the data.

You should also decide on the output format of your files. Will you write your measurements to comma separated values? Do you have some other format you like? Do all your measurements come with a timestamp (especially if you have in consistent time steps or measurement times)? Write this down so you don’t forget it later.

The interface part of the design document is below:

4. Introduction to Classes

2 - A Design Document for the 2-D Ising Model part 1