Resumable Iterations

For many download targets, Instaloader is able to resume a previously-interrupted iteration. It provides an interruptible Iterator NodeIterator and a context manager resumable_iteration(), which we both present here.

Added in version 4.5.

NodeIterator

class instaloader.NodeIterator(context: InstaloaderContext, query_hash: str | None, edge_extractor: Callable[[Dict[str, Any]], Dict[str, Any]], node_wrapper: Callable[[Dict], T], query_variables: Dict[str, Any] | None = None, query_referer: str | None = None, first_data: Dict[str, Any] | None = None, is_first: Callable[[T, T | None], bool] | None = None, doc_id: str | None = None)

Iterate the nodes within edges in a GraphQL pagination. Instances of this class are returned by many (but not all) of Instaloader’s Post-returning functions (such as Profile.get_posts() etc.).

What makes this iterator special is its ability to freeze/store its current state, e.g. to interrupt an iteration, and later thaw/resume from where it left off.

You can freeze a NodeIterator with NodeIterator.freeze():

post_iterator = profile.get_posts()
try:
    for post in post_iterator:
        do_something_with(post)
except KeyboardInterrupt:
    save("resume_information.json", post_iterator.freeze())

and later reuse it with NodeIterator.thaw() on an equally-constructed NodeIterator:

post_iterator = profile.get_posts()
post_iterator.thaw(load("resume_information.json"))

(an appropriate method to load and save the FrozenNodeIterator is e.g. load_structure_from_file() and save_structure_to_file().)

A FrozenNodeIterator can only be thawn with a matching NodeIterator, i.e. a NodeIterator instance that has been constructed with the same parameters as the instance that is represented by the FrozenNodeIterator in question. This is to ensure that an iteration cannot be resumed in a wrong, unmatching loop. As a quick way to distinguish iterators that are saved e.g. in files, there is the NodeIterator.magic string: Two NodeIterators are matching if and only if they have the same magic.

See also resumable_iteration() for a high-level context manager that handles a resumable iteration.

property count: int | None

The count as returned by Instagram. This is not always the total count this iterator will yield.

property first_item: T | None

If this iterator has produced any items, returns the first item produced.

It is possible to override what is considered the first item (for example, to consider the newest item in case items are not in strict chronological order) by passing a callback function as the is_first parameter when creating the class.

Added in version 4.8.

Changed in version 4.9.2: What is considered the first item can be overridden.

freeze() FrozenNodeIterator

Freeze the iterator for later resuming.

property magic: str

Magic string for easily identifying a matching iterator file for resuming (hash of some parameters).

static page_length() int
thaw(frozen: FrozenNodeIterator) None

Use this iterator for resuming from earlier iteration.

Raises:

InvalidArgumentException

If

  • the iterator on which this method is called has already been used, or

  • the given FrozenNodeIterator does not match, i.e. belongs to a different iteration.

property total_index: int

Number of items that have already been returned.

class instaloader.FrozenNodeIterator(query_hash, query_variables, query_referer, context_username, total_index, best_before, remaining_data, first_node, doc_id)

A serializable representation of a NodeIterator instance, saving its iteration state.

It can be serialized and deserialized with save_structure_to_file() and load_structure_from_file(), as well as with json and pickle thanks to being a NamedTuple.

best_before: float | None

Date when parts of the stored nodes might have expired.

context_username: str | None

The username who created the iterator, or None.

doc_id: str | None

The GraphQL doc_id parameter.

first_node: Dict | None

Node data of the first item, if an item has been produced.

query_hash: str | None

The GraphQL query_hash parameter.

query_referer: str | None

The HTTP referer used for the GraphQL query.

query_variables: Dict

The GraphQL query_variables parameter.

remaining_data: Dict | None

The already-retrieved, yet-unprocessed edges and the page_info at time of freezing.

total_index: int

Number of items that have already been returned.

resumable_iteration

instaloader.resumable_iteration(context: InstaloaderContext, iterator: Iterable, load: Callable[[InstaloaderContext, str], Any], save: Callable[[FrozenNodeIterator, str], None], format_path: Callable[[str], str], check_bbd: bool = True, enabled: bool = True) Iterator[Tuple[bool, int]]

High-level context manager to handle a resumable iteration that can be interrupted with a KeyboardInterrupt or an AbortDownloadException.

It can be used as follows to automatically load a previously-saved state into the iterator, save the iterator’s state when interrupted, and delete the resume file upon completion:

post_iterator = profile.get_posts()
with resumable_iteration(
        context=L.context,
        iterator=post_iterator,
        load=lambda _, path: FrozenNodeIterator(**json.load(open(path))),
        save=lambda fni, path: json.dump(fni._asdict(), open(path, 'w')),
        format_path=lambda magic: "resume_info_{}.json".format(magic)
) as (is_resuming, start_index):
    for post in post_iterator:
        do_something_with(post)

It yields a tuple (is_resuming, start_index).

When the passed iterator is not a NodeIterator, it behaves as if resumable_iteration was not used, just executing the inner body.

Parameters:
  • context – The InstaloaderContext.

  • iterator – The fresh NodeIterator.

  • load – Loads a FrozenNodeIterator from given path. The object is ignored if it has a different type.

  • save – Saves the given FrozenNodeIterator to the given path.

  • format_path – Returns the path to the resume file for the given magic.

  • check_bbd – Whether to check the best before date and reject an expired FrozenNodeIterator.

  • enabled – Set to False to disable all functionality and simply execute the inner body.

Changed in version 4.7: Also interrupt on AbortDownloadException.

Next Section

InstaloaderContext (Low-level functions)