Add Configurable Pagination Limit for Large Directories in Coscine SDK
Description:
When fetching large datasets (e.g., directories with more than 1000 files), the current pagination implementation in the Resource.files() method limits retrieval to 20 pages. This can be restrictive for users dealing with large datasets. I propose adding a configurable page limit to allow more flexibility while retaining the default behavior for smaller datasets. Additionally, this limit should be included in the documentation.
Current Behavior:
- The
pagesmethod limits the number of pages retrieved to 20 (defaultPageSizeis 50), resulting in a maximum of 1000 files being fetched from a directory. - The
_fetch_filesmethod doesn’t handle pagination properly for larger directories and only retrieves up to 50 files per page.
Proposed Solution:
- Introduce a configurable
max_pagesparameter: A newmax_pagesparameter will be added to the ApiClient constructor, allowing users to set a custom limit or remove the page limit. The default value will be 20 to preserve the current behavior for smaller datasets. Users who wish to retrieve more than 20 pages (1000 files) can set this value to None to fetch all pages, or set a custom page limit. Changes to ApiClient constructor:
def __init__(self, token: str, ..., max_pages: int = 20):
self.max_pages = max_pages
- Modify the
pagesmethod: Thepagesmethod will respect the max_pages parameter set during ApiClient initialization. If max_pages is set to None, all pages will be fetched. Otherwise, the method will stop after fetching the specified number of pages. Changes topagesmethod:
def pages(self) -> Iterator[ApiResponse]:
yield self
if self.is_paginated:
response = self
request = self.request
page = 1
while response.has_next:
page += 1
request.params["PageNumber"] = page
response = self.client.send_request(request)
yield response
if self.client.max_pages and page >= self.client.max_pages:
break
- No further changes needed to
_fetch_files: The_fetch_filesmethod will now automatically handle pagination based on the configuration set by the max_pages parameter in the ApiClient.
Why This Change Is Necessary:
Preserves backward compatibility: The default max_pages value of 20 ensures that the SDK behaves as it currently does for most users with smaller datasets.
Increased flexibility for large datasets: Users dealing with large resources will have the flexibility to set a custom page limit or remove the limit entirely to retrieve all files.
Impact: This change allows users to efficiently handle both small and large datasets, improving the flexibility and scalability of the Coscine SDK without allowing for large numbers of unintended API calls.