Add Configurable Pagination Limit for Large Directories in Coscine SDK

Description:

When fetching large datasets (e.g., directories with more than 1000 files), the current pagination implementation in the Resource.files() method limits retrieval to 20 pages. This can be restrictive for users dealing with large datasets. I propose adding a configurable page limit to allow more flexibility while retaining the default behavior for smaller datasets. Additionally, this limit should be included in the documentation.

Current Behavior:

  • The pages method limits the number of pages retrieved to 20 (default PageSize is 50), resulting in a maximum of 1000 files being fetched from a directory.
  • The _fetch_files method doesn’t handle pagination properly for larger directories and only retrieves up to 50 files per page.

Proposed Solution:

  1. Introduce a configurable max_pages parameter: A new max_pages parameter will be added to the ApiClient constructor, allowing users to set a custom limit or remove the page limit. The default value will be 20 to preserve the current behavior for smaller datasets. Users who wish to retrieve more than 20 pages (1000 files) can set this value to None to fetch all pages, or set a custom page limit. Changes to ApiClient constructor:

def __init__(self, token: str, ..., max_pages: int = 20):

self.max_pages = max_pages

  1. Modify the pages method: The pages method will respect the max_pages parameter set during ApiClient initialization. If max_pages is set to None, all pages will be fetched. Otherwise, the method will stop after fetching the specified number of pages. Changes to pages method:

def pages(self) -> Iterator[ApiResponse]:

yield self

if self.is_paginated:

response = self

request = self.request

page = 1

while response.has_next:

page += 1

request.params["PageNumber"] = page

response = self.client.send_request(request)

yield response

if self.client.max_pages and page >= self.client.max_pages:

break

  1. No further changes needed to _fetch_files: The _fetch_files method will now automatically handle pagination based on the configuration set by the max_pages parameter in the ApiClient.

Why This Change Is Necessary:

Preserves backward compatibility: The default max_pages value of 20 ensures that the SDK behaves as it currently does for most users with smaller datasets. Increased flexibility for large datasets: Users dealing with large resources will have the flexibility to set a custom page limit or remove the limit entirely to retrieve all files.

Impact: This change allows users to efficiently handle both small and large datasets, improving the flexibility and scalability of the Coscine SDK without allowing for large numbers of unintended API calls.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information