mod_read module

mod_read.retrieve_list_of_files_from_url(path_catalog, path_data, prefix3='dt_', read_online=True)

Retrieve a list of files from a URL catalog and construct download links.

Parameters:
  • path_catalog (str) – The URL to the catalog page containing the list of files to retrieve.

  • path_data (str) – The base URL path to the directory where the files are hosted.

  • prefix3 (str, optional) – A prefix filter to select files with a specific prefix (default is ‘dt*’).

  • read_online (bool, optional) – If True, constructs online-readable URLs (default is True).

Returns:

A sorted list of file URLs, ready for download or online access.

Return type:

list

Notes

This function reads the contents of a catalog page hosted at the provided URL (path_catalog). It then extracts the filenames and constructs file URLs based on the path_data. The constructed URLs can be either for online reading (default) or direct download, depending on the value of the read_online parameter.

The optional prefix3 parameter allows filtering files by a specific prefix. Only files with names starting with the specified prefix will be included in the list.

Examples

>>> catalog_url = "https://example.com/catalog/"
>>> data_base_url = "https://example.com/data/"
>>> file_list = retrieve_list_of_files_from_url(catalog_url, data_base_url)
>>> for file_url in file_list:
...     print(file_url)  # Print the constructed file URLs.