Xtremely Large File Systems for the small collaborative world Arun Jagatheesan, Dice Research SDSC Even though the "file" as we know it remains the same, the "filesystem" that manages the files keep changing. The changes in the filesystems are not just caused by the scale or the size of files to be stored, but by the fundamental differences in the scope of what a filesystem is expected to be in future. In this (proposed) talk, we will look into a current use case that is pushing the envelope of filesystems and our current solution. The LSST project is expected to manage 200+ petabytes of replicated data, distributed in several countries. LSST is an optical telescope constructed in Chile through federal and private funding sources. LSST telescope has a 3 billion-pixel camera that will continue to capture images of the sky for more than 10 years of its initial operation. The images will be stored as files. The system that will store and manage this large number of files would provide features that differentiates it self from the scope of a regular "file system". Some of the expected requirements and features of this proposed "file system" includes: * Include heterogeneous storage resources such as high-speed disks, network storage, archival storage etc., from multiple partners located in different parts of the world, as part of a logical storage pool to store the files based on access patterns and storage policies. * Allow contradicting needs of consistency and distribution: While allowing any storage resource from any partner country to participate in the LSST collaboration (in a p2p manner), allow a centralized consistency of all files in the file-tree (logical namespace). * Manage the lifecycle of the files: Ingest files from the images that are created by the telescope in Chile, archive a replica in Chilean data center, create/transfer another replica for processing in US data center and archive a geographically distant replica in US. All these data transfers have to take place automatically within in the whole system based on replication policies. * Provide automatic selection of the appropriate replica of a file, transparent to the user. In addition, allow users to discover files by querying the metadata (apart from the ability to traverse the file- tree using traditional directories). Clearly, projects like these show the emerging trend in enterprise computing rather than an isolated problem in scientific data management. Global companies will face similar problems when they are required to serve a large amount of data using multiple data centers around the world or by using the much-hyped cloud-providers. The talk will provide an overview of the LSST requirements and our solution-using database and grid technologies.