Best Practices for NextGen IDE Users
At a Glance
This document lists best practices for migrating to and using HISE NextGen IDEs. Be sure to include this information when you introduce new employees to the IDE workflow.
General Best Practices
- Don't create IDEs with a massive amount of disk space (>200 GB).
- Treat every IDE as a temporary data processing space, not a permanent workspace.
- Use
uploadFiles()
to establish a Collaboration Space in HISE for your data and IDEs. - Use the Project Store to handle data that isn't processed by our automated analysis pipeline.
- Preserve startup scripts in your
/private
folder so you have them handy to create new IDEs. - Avoid dragging and dropping files > 1 GB in size. Instead, use CLI commands to work with your files.
- If the selected IDE modality doesn't meet your needs, improve its functionality by adding relevant packages and libraries.
- Jupyter notebooks must run in order from start to finish. Begin by caching HISE data, and end by uploading your outputs.
- Stop your instance if you can do so conveniently. Stopped instances incur cloud computing costs only for disk space—not for CPUs or memory. Delete your instance if possible. Deleted instances incur no cloud computing costs.
- Instead of consolidating all research activities into a single IDE, create separate notebooks for specific purposes, such as preprocessing, predictive modeling, and visualization. This task-based approach streamlines your workflow and lowers cloud computing costs, since small instances are easy to stop or delete.
- For programmatically generated outputs that you intend to keep or share, write to the
/home/workspace
folder instead of writing directly to the notebook. (More specifically, write to/home/workspace/temp
or any other/home/workspace
location except/environment
,/input
,/private
, or/sdk
.) Doing so ensures the persistence of your data and allows for easy data transfer between the IDEs within your account. - Side-loading (manually uploading) data to your IDE makes the source difficult to trace for reproducibility. For greater transparency, either encode the data acquisition process directly in your Jupyter Notebook or use a watchfolder to ingest external data sources automatically into HISE.
Migration Best Practices
The following are best practices for migrating the NextGen IDEs. For details, see Migrate to the HISE NextGenIDEs (Q&A) and The Great NextGen IDE Migration (Tutorial).
- To begin the migration, delete everything in
/home/jupyter/cache.
Then tar up your entire/home/jupyter
directory and and upload it to your/private
folder. - Use the
/private
folder to store intermediate results or any other data whose path need not be traced for reproducibility. CertProd doesn't have access to this directory. - Save your intermediate results to
/home/workspace/temp
. - To read files from Python or R, use the
/input
folder. If the process is too slow, copy the files to the root disk (/
) and perform the same operation. Be aware, however, that files stored there do not persist and will be lost when your instance is rebooted. - Don't copy root disk files (that is, files outside of
/home/workspace
, like/usr/bin/hisepy
). Copy only files within/home/workspace
. - To efficiently adjust path references or other syntax in multiple files, you can use a single
sed
(stream editor) command with a regular expression. For example, the following command searches for lines ending with*/
, removes the trailing forward slash, and applies the change globally.
sed 's|*\/$|*$|g'
- The workspace disk allocation limit is now 5 TB for the duration of the NextGen IDE migration. To request >5 TB, contact immunology-support@allenintitute.org. That limit returns to 1 TB on January 8, 2025, when the migration is complete.
- Files accidentally deleted from your
/private
folder are retained for 90 days. To request file recovery, contact the dev team at immunology-support@alleninstitute.org.
Related Resources
Migrate to the HISE NextGen IDEs (Q&A)