Site, Data Location, and Operation Concepts

This document explains the key concepts in the CCAT data transfer system, focusing on how Sites, Data Locations, and Operations work together to manage data flow across the observatory infrastructure.

Overview

The data transfer system is built around a hierarchical structure that organizes data storage and processing across different physical and logical locations. Think of it like a warehouse and logistics system where:

Sites are like cities or countries
Data Locations are like specific warehouses or storage facilities within those cities
Operations are like the different warehouse activities (packing, unpacking, storing, shipping, etc.)

Sites

A Site represents a physical or logical location where data can be stored or processed. Examples include:

CCAT (the telescope site in Chile)
Cologne (the CCAT data center)
Other potential sites (e.g. other data centers, observatories, etc.)

Each site has: - A unique name (e.g., “CCAT”, “Cologne”) - A short name for technical use (e.g., “ccat”, “cologne”) - A geographical location description (e.g., “Atacama”, “Germany”, “USA”)

Think of sites as the major hubs in your data network - each represents a different physical location with its own infrastructure.

Data Locations

Data Locations are specific storage or processing areas within a site. They’re like specific rooms or servers within a building. Each data location has:

A Location Type (what it’s used for)
A Storage Type (how data is stored)
A Priority (which location to use first if multiple options exist)

Location Types

The system recognizes four main types of data locations:

SOURCE - Telescope Instrument Computers: These are the computers that collect raw data from observations. Think of them as the “data collection points” where observations first land. There can be multiple SOURCE locations for each instrument.
BUFFER - Input/Output Buffers: These are temporary storage areas where data can be transferred from and/or received to. They act like staging areas where data is organized and prepared for the next step. Multiple buffers can exist with different priorities.
LONG_TERM_ARCHIVE - Permanent Storage: These are the final destinations for data - permanent storage systems where data is kept for long-term access. Think of them as the “vaults” where data is safely stored.
PROCESSING - Temporary Processing Areas: These are specialized areas where data undergoes analysis or transformation. They’re like “workshops” where data is processed before being stored or transferred.

Storage Types

Each data location also has a storage type that determines how data is physically stored:

DISK - Traditional disk storage: Regular hard drives or SSDs. Fast access, good for temporary storage and processing.
S3 - Object storage (like AWS S3): Cloud-based storage that’s good for large amounts of data and long-term archiving.
TAPE - Tape storage: Traditional tape drives for very long-term, cost-effective storage.

The Type determines how to access the data location (e.g. via SSH, S3, tape, etc.).

Operations

Operations are the different types of work that can be performed on data at each location. The system automatically determines which operations are available based on the location type:

Source Locations (Telescope Computers)

Raw Data Package Creation - Organizing raw observation data into packages
Deletion - Removing data that has been successfully transferred to the long-term archive
Monitoring - Checking system health and performance

Buffer Locations (Temporary Storage)

Data Transfer Package Creation - Preparing data for transfer between sites
Data Transfer Unpacking - Extracting data from transfer packages
Data Transfer - Moving data between locations
Deletion - Cleaning up temporary files
Long-Term Archive Transfer - Moving data to permanent storage
Monitoring - System health checks

Long-Term Archive Locations (Permanent Storage)

Long-Term Archive Transfer - Moving data between archive locations
Monitoring - Ensuring data integrity

Processing Locations (Analysis Areas)

Staging - Retrieving data from the local long-term archive of this site
Deletion - Cleaning up processed data
Monitoring - System health checks

How It All Works Together

The system uses a “queue-based” approach where different types of work are automatically routed to the appropriate locations. Here’s how it works:

Data Collection: Raw data arrives at SOURCE locations (telescope computers). This data enters the system as a RawDataFile that is registered via the OpsDB API and linked to ExecutedObsUnit and InstrumentModule it belongs to.
Initial Processing: Data is organized into packages at SOURCE locations (RawDataPackage).
This is done by the raw_data_package_manager.
Transfer Preparation: Data packages are prepared for transfer at BUFFER locations (DataTransferPackage).
This is done by the data_transfer_package_manager.
Data Movement: Data is transferred between sites using appropriate transfer methods (disk-to-disk, disk-to-S3, etc.). This is done by the transfer_manager.
Unpacking: Data is extracted and verified at destination BUFFER locations
This is done by the data_integrity_manager.
Archiving: Data is moved to LONG_TERM_ARCHIVE locations for permanent storage
This is done by the archive_manager.
Processing: Data can be staged to PROCESSING locations for analysis
This is done by the staging_manager.
Cleanup: Temporary files are deleted as data moves through the system
This is done by the deletion_manager.

Queue Routing

The system automatically creates “queues” (like different work stations in a warehouse) for each location and operation combination. For example:

ccat_telescope_computer_raw_data_package_creation
cologne_buffer_data_transfer
cornell_archive_long_term_archive_transfer

This ensures that work is always sent to the right place and doesn’t interfere with other operations.

Priority and Failover

Multiple data locations of the same type can exist at a site (like having multiple backup servers). The system uses:

Priority levels (lower numbers = higher priority)
Active/Inactive status to handle maintenance or failures
Automatic failover when primary locations are unavailable

This means if your main buffer is full or offline, the system automatically uses the next available buffer.

Data Flow Example

Here’s a typical data journey:

Observation: Telescope collects data → SOURCE location (telescope computer)
Package Creation: Raw data organized → SOURCE location creates packages
Transfer: Data moved → BUFFER location at destination site
Unpacking: Data extracted and verified → BUFFER location
Archiving: Data moved to permanent storage → LONG_TERM_ARCHIVE location
Processing: Data staged for analysis → PROCESSING location (if needed)
Cleanup: Temporary files deleted → Various locations

The system handles all the routing, queuing, and error handling automatically, so data flows smoothly from collection to permanent storage without manual intervention.

Key Benefits

Automatic Routing: Work goes to the right place automatically
Fault Tolerance: System continues working even if some locations fail
Scalability: Easy to add new sites and locations
Flexibility: Different storage types for different needs
Monitoring: Built-in health checks and performance tracking

This architecture allows the CCAT observatory to efficiently manage large amounts of astronomical data across multiple international sites while maintaining data integrity and system reliability.