Paul Greenfield CSIRO The genomics research community is a heavy user of database technology but their focus is largely on databases as simple repositories of genetic data. The conventional genomics processing pipeline stores genetic sequence data as long strings that are retrieved in their entirety from databases and searched externally using pattern-matching tools such as BLAST and customised Perl scripts. This approach is somewhat more sophisticated than the 'FTP and grep' model of processing astronomical data once described by Jim Gray, but it falls well short of being able to directly answer biological questions by running queries over suitably structured databases - and this is the eventual goal of the work I would like to discuss at HPTS 2009. My current work on genomic databases and queries is focussed on bacteria, as they have smaller and simpler genomes, and large numbers of them have been sequenced already. Bacteria also have the advantage of having been diverg ing genomically for billions of years, so distantly related organisms share few DNA sequences. One of my current projects compares the genomes of 700+ bacteria to each other in a single pass through a database of bacterial genomes. This program performs about 5 billion short-sequence-look-up queries in about 14 hours on a quad-core workstation, at an average speed of about 100,000 database queries/second. This level of performance is achieved through careful use of indices and synchronised look-up threads to make effect ive use of the database's read-ahead and buffering strategies. The result of this analysis is a very high-level view of how different species of bacteria are related, and shows where current bacterial taxonomies may need to be revised. This same database can also answer queries about gene sharing be tween organisms, and can be used to shed light of the structure of bacteria l communities ('metagenomics'). These bacterial databases and the applications that query them are highly partionable and scalable - making them good candidates for implementation using map-reduce algorithms on large-scale clusters and clouds. The work I would most like to discuss with the HPTS community is storing and querying large numbers of large, complex genomes. The cost of DNA sequencing is falling rapidly and promising to fall even faster in the next few years. There will certainly be thousands of complete human genomes available to researchers in the next few years, and perhaps many more than that if the promises of the sequencing technology vendors can be believed. The challenge will be structuring and storing this volume of data (at least 6 giga-basepairs per genome) as something more than semi-structured sets of strings, something that will let researchers answer questions about differences between populations, and the relationship between genetic differences and diseases - a challenge very similar to the astronomical one addressed by Jim Gray and Alex Szalay. We have also been continuing the work on consistency for loosely-coupled, service-based applications - the 'Promises' work that I discussed at the last HPTS and at CIDR in 2007. Most recently we have looked at what we called property-based promises and how promises over abstract resources could be effectively implemented. If I am invited to HPTS again this year, I look for ward to discussing with Pat Helland and others just what consistency means in a loosely-coupled world and what patterns and technologies could give 'good enough' consistency.