.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/loading_data/loading_mrc_files.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_loading_data_loading_mrc_files.py: Loading MRC files (and other binary files) ========================================== This is a simple example of how to load MRC files using Pyxem. The MRC file format is a common format for electron microscopy data. It is a binary format that is used for storing 3D data, such as electron tomography but because it is a fairly simple format, it has been adopted in some cases to store 4D STEM data as well. First we will download a sample MRC file from the Pyxem data repository. This is a good way to host data if you want to share it with others. I love putting small versions (up to 50 GB) of every dataset I publish on Zenodo and then using pooch to automate the download/extraction process. .. GENERATED FROM PYTHON SOURCE LINES 14-31 .. code-block:: Python import os import zipfile import pooch current_directory = os.getcwd() file_path = pooch.retrieve( # URL to one of Pooch's test files url="https://zenodo.org/records/15490547/files/ZrNbMrc.zip", known_hash="md5:eeac29aee5622972daa86a394a8c1d5c", progressbar=True, path=current_directory, ) # Unzip the file with zipfile.ZipFile(file_path, "r") as zip_ref: zip_ref.extractall(current_directory) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/35.5M [00:00`_ to read the file. In this case, because the file was collected with a Direct Electron camera, the metadata is automatically loaded as well. .. GENERATED FROM PYTHON SOURCE LINES 38-44 .. code-block:: Python import hyperspy.api as hs signal = hs.load( "ZrNbMrc/20241021_00405_movie.mrc", ) .. GENERATED FROM PYTHON SOURCE LINES 45-51 Loading Lazily -------------- In this case the file was loaded using the numpy.memmap function, this won't load the entire file into memory, but if for example you do ``signal.sum()`` now the entire file will be loaded into memory. In most cases it is better to just use the ``lazy=True`` option to load the file lazily. .. GENERATED FROM PYTHON SOURCE LINES 51-55 .. code-block:: Python signal = hs.load("ZrNbMrc/20241021_00405_movie.mrc", lazy=True) signal .. raw:: html
Title:
SignalType:
Array Chunk
Bytes 37.50 MiB 37.50 MiB
Shape (30, 20|128, 128) (30,20|128,128)
Count 2 Tasks 1 Chunks
Type float32 numpy.ndarray

Navigation Axes

Signal Axes

30 20 128 128


.. GENERATED FROM PYTHON SOURCE LINES 56-64 Controlling the Chunk Size -------------------------- The chunk size is the number of frames that will be loaded into memory at once when the signal is lazy loaded. This can be controlled using the ``chunks`` parameter. A good place to start is to use the ``auto`` option for the first two dimensions, which will automatically determine the chunk size based on the available memory. The last two dimensions are the reciprocal space dimensions, as we usually ``map`` over those dimensions we can set them to ``-1`` to indicate that we want to load all the data in those dimensions at once. .. GENERATED FROM PYTHON SOURCE LINES 64-68 .. code-block:: Python signal = hs.load("ZrNbMrc/20241021_00405_movie.mrc", lazy=True, chunks=(10, 10, -1, -1)) signal .. raw:: html
Title:
SignalType:
Array Chunk
Bytes 37.50 MiB 6.25 MiB
Shape (30, 20|128, 128) (10,10|128,128)
Count 12 Tasks 6 Chunks
Type float32 numpy.ndarray

Navigation Axes

Signal Axes

30 20 128 128


.. GENERATED FROM PYTHON SOURCE LINES 69-76 Slicing the Signal ------------------ Interestingly, binary files are sometimes faster than compressed formats. With compressed file formats, like HDF5 or Zarr, you need to decompress the entire chunk before you can access and part of the data. For things like Virtual Images or slicing a signal this can add overhead. With binary files, because the underlying data is a memory map, even for dask arrays, you can very efficiently slice parts of the data without loading the entire chunk into memory. .. GENERATED FROM PYTHON SOURCE LINES 76-80 .. code-block:: Python slice_sum = signal.isig[0:10, 0:10].sum() slice_sum.compute() .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/16 [00:00` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: loading_mrc_files.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: loading_mrc_files.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_