Resumable Iterations

For many download targets, Instaloader is able to resume a previously-interrupted iteration. It provides an interruptible Iterator NodeIterator and a context manager resumable_iteration(), which we both present here.

New in version 4.5.

NodeIterator

class NodeIterator(context, query_hash, edge_extractor, node_wrapper, query_variables=None, query_referer=None, first_data=None)

Iterate the nodes within edges in a GraphQL pagination. Instances of this class are returned by many (but not all) of Instaloader’s Post-returning functions (such as Profile.get_posts() etc.).

What makes this iterator special is its ability to freeze/store its current state, e.g. to interrupt an iteration, and later thaw/resume from where it left off.

You can freeze a NodeIterator with NodeIterator.freeze():

post_iterator = profile.get_posts()
try:
    for post in post_iterator:
        do_something_with(post)
except KeyboardInterrupt:
    save("resume_information.json", post_iterator.freeze())

and later reuse it with NodeIterator.thaw() on an equally-constructed NodeIterator:

post_iterator = profile.get_posts()
post_iterator.thaw(load("resume_information.json"))

(an appropriate method to load and save the FrozenNodeIterator is e.g. load_structure_from_file() and save_structure_to_file().)

A FrozenNodeIterator can only be thawn with a matching NodeIterator, i.e. a NodeIterator instance that has been constructed with the same parameters as the instance that is represented by the FrozenNodeIterator in question. This is to ensure that an iteration cannot be resumed in a wrong, unmatching loop. As a quick way to distinguish iterators that are saved e.g. in files, there is the NodeIterator.magic string: Two NodeIterators are matching if and only if they have the same magic.

See also resumable_iteration() for a high-level context manager that handles a resumable iteration.

property count

The count as returned by Instagram. This is not always the total count this iterator will yield.

Return type

Optional[int]

property total_index

Number of items that have already been returned.

Return type

int

property magic

Magic string for easily identifying a matching iterator file for resuming (hash of some parameters).

Return type

str

property first_item

If this iterator has produced any items, returns the first item produced.

New in version 4.8.

Return type

Optional[~T]

freeze()

Freeze the iterator for later resuming.

Return type

FrozenNodeIterator

thaw(frozen)

Use this iterator for resuming from earlier iteration.

Raises

InvalidArgumentException

If

  • the iterator on which this method is called has already been used, or

  • the given FrozenNodeIterator does not match, i.e. belongs to a different iteration.

Return type

None

class FrozenNodeIterator(query_hash, query_variables, query_referer, context_username, total_index, best_before, remaining_data, first_node)

A serializable representation of a NodeIterator instance, saving its iteration state.

It can be serialized and deserialized with save_structure_to_file() and load_structure_from_file(), as well as with json and pickle thanks to being a namedtuple().

best_before

Date when parts of the stored nodes might have expired.

context_username

The username who created the iterator, or None.

first_node

Node data of the first item, if an item has been produced.

query_hash

The GraphQL query_hash parameter.

query_referer

The HTTP referer used for the GraphQL query.

query_variables

The GraphQL query_variables parameter.

remaining_data

The already-retrieved, yet-unprocessed edges and the page_info at time of freezing.

total_index

Number of items that have already been returned.

resumable_iteration

resumable_iteration(context, iterator, load, save, format_path, check_bbd=True, enabled=True)

High-level context manager to handle a resumable iteration that can be interrupted with a KeyboardInterrupt or an AbortDownloadException.

It can be used as follows to automatically load a previously-saved state into the iterator, save the iterator’s state when interrupted, and delete the resume file upon completion:

post_iterator = profile.get_posts()
with resumable_iteration(
        context=L.context,
        iterator=post_iterator,
        load=lambda _, path: FrozenNodeIterator(**json.load(open(path))),
        save=lambda fni, path: json.dump(fni._asdict(), open(path, 'w')),
        format_path=lambda magic: "resume_info_{}.json".format(magic)
) as (is_resuming, start_index):
    for post in post_iterator:
        do_something_with(post)

It yields a tuple (is_resuming, start_index).

When the passed iterator is not a NodeIterator, it behaves as if resumable_iteration was not used, just executing the inner body.

Parameters

Changed in version 4.7: Also interrupt on AbortDownloadException.

Return type

Iterator[Tuple[bool, int]]

Next Section

InstaloaderContext (Low-level functions)