Core C Data Management & Biostatistics CoreOverview and Rationale
The focus of this novel Core is the acquisition of data, data management, review and analysis .The Microbicide Development Program (MDP) incorporates skilled resources from more than five institutions addressing different but focused research questions involved in the successful development of a safe and effective rectal microbicide. Central to the concept of the U19´s collaborative agreement structure, access to relevant forms, databases (such as trial CRFs) and regulatory documents for NIH Project Officers and Scientific Advisory Panel (SAP) is important.Of equal importance is for the Program´s team members to be able to review their own data (often acquired elsewhere), which may be located at different institutions, as well as to participate in weekly teleconferences where review of other projects´ results are discussed.
Therefore, all four projects will rely heavily on this Core, directed by Dr. Christopher Denny.Web-accessible data management resources will be developed and housed within the Computer Technology Research laboratory facility at UCLA as described below.This web-accessible approach has great strengths and is pivotal to the efficiency and success of such a broadly based program.Once data is reviewed and finalized, biostatistical analysis can proceed in a coordinated fashion with all associated biostatisticians working from the same locked datasets.An additional advantage of this Core is the co-Directorship of Dr. William Cumberland, Director of the UCLA CFAR Biostatistics Core and Chair of the Biostatistics Department at UCLA.This oversight of statistical and analysis plans in all projects is coordinated to ensure cohesiveness of approach, integration of optimized methods, oversight protection and the summary capacity to assist in overall implications for one project based on findings in another project.As an example, we want to ensure that no reasonable microbicide candidate is eliminated from development (Project 4), due to observed mucosal inflammation that may, in fact, be a consequence of trauma related to coitus (Project 2) or prepared in an unacceptable formulation (Project 3).As all these variables can not be evaluated in a single trial, interpretative projections based on common acquisition techniques, similar populations and identical units, will be one of the strengths of this application, making the sum greater than the parts.
follows below is a description of the data management portion of the Core,
supported by evidence of recent successful applications. This is followed by a
description of the biostatistical services provided by the Core.The budget requests salary support for
the Director, co-Director and programming staff. The amount of time supported
varies over the 5 year course with more time required in the initial 3 years
for the Data Management Program customization and development with the later
years having a greater percentage of Biostatistical time for collaborative
analysis and oversight.
Needs and Requirements.
The Data Management and Biostatistics Core will provide needed infrastructure and support for collection, management and interpretation of complex data sets.The dispersed nature of this Microbicide Development Program (MDP) with different investigators located in different cities and countries spanning nine time zones, imposes increased requirements for data compatibility and accessibility.
To address this need as a general rule, all data that require biostatistical interpretation (discussed later) will be managed by web-based applications tailored to individual investigator needs.Brief descriptions of some of the data required in each project are described below and will be handled by the Core.Individual Project Data Needs Pathogenesis and Drug Development Pipeline
In this primarily laboratory based project, data will be derived from studies using either cell lines, PBMCs, MMCs (mucosal mononuclear cells), rectal explant tissue cultures (data acquired over 1-10 days) or safety/efficacy data from macaque trials.There will be multiple readouts per experiment (at UCLA, St. George´s or HPA-Porton Down, London) requiring review and analysis by investigators at St. George´s and NIH.For example, in the rectal explant studies, each tissue sample will have specific datasets: (i) histology pictures captured and read using a scoring system at one site; (ii) flow cytometry panels (up to 4 tubes, each with 4 colors) for downloading and re-evaluating at another site with eventual tabulation into an Excel-like spreadsheet; (iii) rt-PCR results of mucosal cytokine mRNA and HIV viral load (12 cytokines and 3 viral groupings: R5, X4, R5+X4); (iv) transwell results of numbers and types of cells migrating through screen from explant reported in spreadsheet; (v) in situ rt-PCR for HIV RNA to identify target mucosal cells (in development: see Letters of Support).Once baseline assay characteristics are defined, the same data will be collected for analysis following exposure to the three selected RT microbicide candidate over a dose range.The model can be adapted for other candidates later.Inflammatory and Injury Correlates of Receptive Anal Intercourse (RAI
Project 2 will be trials of human subjects and will have clinical data and IRB/oversight review.These are non-intervention, descriptive trials to characterize mucosal consequences of RAI.The clinical parameters of each of the 4-5 patient cohorts studied (men/women; HIV+/-; anal-receptive or not) will be contained in a few CRF pages which are not expected to change over the five years.Studies will include (i) using a simulated rectal coitus device with regulated insertion frequency in volunteer subjects (already IRB-approved); (ii) similar evaluations in actual sexual partnerings; (iii) explant experiments using rectosigmoid biopsies to see if tissue post-coitus is more infectable in vitro than pre-coital samples from the same individuals.Data readout for explants will be similar to Project 1 but require clear links to subject clinical descriptors.Data readout will include: questionnaires/self-report with minor, within questionnaire modifications over the five years; rectal secretory antibody data (ELISAs); rectal viral shedding in the appropriate cohorts (reports in spreadsheet as #copies/ ml); permeability data (in terms of % recovery in urine over time); fecal calprotectin levels and epithelial cells in rectal effluent (tabulated into spreadsheet form); colposcopic appearance using high-resolution anoscopy (HRA)(pictures to be scored at one location) with eventual tabular score in a spreadsheet; histology grading of inflammation/epithelial disruption (as in Project 1); mRNA rt-PCR readout of the same 12 cytokines and HIV RNA and DNA mucosal and plasma viral load as in Project 1 (tabular in spreadsheet) with pre/post coital numbers in same subjects in cohorts.Behavioral Correlates of RAI and Acceptability of Proposed Microbicidal Formulations
Data from this project will involve large human cohorts (880 subjects) at 2 sites (UCLA/JHU) receiving detailed questionnaires about behavior related to RAI (frequency, positions, symptoms, activities and consequences).Overall descriptors will be derived from crossectional data while acceptability data related to actual formulations and applicators tried will be evaluated in subsets (years 4-5).Forms will not change much in the first 2 years and then might undergo a significant revision as we approach the Year 2.5 re-evaluation and Milestone assessment.The data acquired in the actability portion will be similar to that in the cross-sectional study in that it will entail use of detailed questionnaires. These will be utilized in a repeated fashion as the subset of subjects undergoes randomized testing of all versions of formulations. For all 880 subjects, as well as for the acceptability subject, there will be colposcopic exams and associated pictures from high-resolution anoscopy (HRA) with central readout and spreadsheet scoring and results from STD screening. Project 4: Pilot testing of Safety of Rectal Microbicides in HIV-seronegative and seropositive Men and Women
This Project will entail two exploratory pilot studies of rectal microbicides in humans.All of the present RT microbicide candidates (to be considered in this MDP) are presently in Phase 2 testing in vaginal form. Per discussions with the FDA, IND submissions will need to be filed which, in turn, means that pre-clinical toxicity and animal data will be required.Data recorded here is standardized.Each company (starting with Biosyn) will need its data password-protected but the MDP PIs will have site-wide data access.The Project will start with one candidate: UC-781.Once the IND is filed and accepted, the pilot trials can begin with the associated detailed safety clinical data and demographics recorded.In the first trial, each subject will have two baseline visits followed by single exposure, a two-week interval with a week of QD exposure, a two-week interval followed by a week of BID exposure.This template will include placebo and two doses which involves 36 subjects and 360 endoscopies over the trial period.Data recording forms are not expected to change once prepared.The "mucosal evaluation" data after exposure to microbicide will be similar to above: subject questionnaires pre/post; secretory antibodies, secreted viral load, permeability data, secreted/sloughed epithelial cells; endoscopic appearance, histology, mRNA cytokines/HIV RNA; flow and co-receptors/activation markers.overall data management needs: Clinical data management
At the heart of the MDP is a drive to couple basic and translational development strategies with clinical investigations to develop safe and novel rectally administered microbicides for sexually-active populations, both HIV seropositive and seronegative.As outlined above, of the four projects in this program, three involve generation of clinical patient surveys or performance of pilot clinical drug trials.These projects will all require the capability to accrue clinical data from multiple sources, entered by numerous healthcare providers at different time points over each patient´s course, over a five-year span. Adequate search and reporting tools need to be provided so that investigators, as well as monitors from the Regulatory Core and the external SAP, can dynamically monitor project progress.Finally effective interfaces need to be created to allow efficient downloading of collected clinical data into statistical software applications to allow for timely monitoring and definitive interpretation.
Much clinical data cannot be reduced to strictly objective measurements but involves subjective observations which are inherently unstructured.It is likely that different investigators will wish to track different clinical elements and that the scope of their clinical data set may in fact change over time.For example, the anticipated clinical data management needs for projects 2 and 3 are relatively modest and can likely be summarized with a focused set of online forms.Administration of pre-clinical and safety trials in Project 4, on the other hand, is likely to be a more daunting task requiring: (i) more in depth monitoring of subject populations using more complex surveys; (ii) creation of secure interfaces with biotech and pharma partners; (iii) management and accessible storage of IRB, NIH´s PSRC and FDA documents and communications.
To meet this challenge, we have created a flexible and powerful software application termed Survey Builder.This application is a mature web-based method to create conditional questionnaires.It was first designed and built to support online test, survey, and questionnaire data collection, management, and reporting.Over the last seven years we have refined this application to include dynamic branching, versioning, and other advanced capabilities.Survey Builder has already been effectively employed to manage complex clinical data sets for a large prostate cancer study that is ongoing at UCLA (see IMPACT study described below).Common resource procurement and tracking.
As with any highly collaborative venture, there will be physical resources that will be collected and shared among the member projects of this program.A prime example of such a resource are subject specimens (e.g. tissue biopsies, blood, colposcopic pictures, culture results) collected in the course of the clinical studies proposed in all four projects of this application. Sample integrity, sample information content and tracking accuracy are crucial to a successful specimen bank, as well as subject safety and protection (especially in cases of a positive STD result). Working with project leaders, care will be taken to assure optimal handling of subject specimens and tissues. Equally important is the density of information that can be associated with each sample. Investigators typically want to correlate a biochemical/genetic/descriptive attributes of a subject sample to a particular clinical event. In this regard, the better documented a subject sample, the more valuable it can be. Therefore, our tissue tracking application will be sufficiently flexible to allow easy association of collected subject materials with other subject data (all coded/anonymized).
Like a bank, there are two basic specimen transactions that are anticipated:deposits and withdrawals.From the deposit perspective, it is recognized that each subject over time may undergo many diagnostic events (one to many).
Each event could yield multiple samples (one to many).Individual patient specimens vary widely in type from urine, to peripheral blood to mucosal tissues from endoscopic biopsies. Working with each project leader and investigator, we will create and implement an online database backed application that will support a work flow consisting of:(i) unambiguous labeling of subject specimens with bar coded identifiers; (ii) entry of specimen descriptions and clinical information and/or pathology reports; (iii) identification of a final storage location and time of deposit for each specimen.
We recognize that these data may be entered by different personnel at different times and a mechanism to accurately associate data from diverse sources will be created, as we have done for other projects. For example, histologic and pathologic interpretation of a particular tissue specimen will likely be performed days after the specimen has been obtained.An efficient means of linking the correct reports to the corresponding tissue is mandatory.In the interests of maintaining subject confidentiality, this web-based application will run under a Secure Sockets Layer (SSL) with 128 bit encryption. In accordance with HIPAA regulations, specimens will be logically linked to subject identifiers only with those subjects who have freely given informed consent and who are enrolled in the study.
Investigators create specimen banks with the intent of eventually using these specimens for future studies.A process of efficiently searching the bank and withdrawal of desired specimens will be created.As a start, this group of investigators has already developed an elaborate, effective client-side (ACCESS) database meeting these criteria.This relational database has been part of the Mucosal Immunology CFAR Core at UCLA and has over 1000 samples stored with efficient clinical correlates and retrieval patterns.This current database will be expanded into a web-accessible client-server architecture to meet the needs of the MDP.Laboratory based data set management.
Numerous laboratory based assays and assessments are proposed in all four projects. In many instances, these assays are objective, resulting in well-defined and reproducible numerical data.Examples of such data include viral or antibody titers, FACS profiles or quantitative molecular assays for specific components in tissues or body fluids, as described above.Creation of web-accessible tools will facilitate ease and accuracy of entry into a central database.This will be particularly important considering that these assays are likely to be performed at various locations.Internet accessibility will also allow for all MDP investigators to monitor the progress of the project.
In contrast to most lab-based quantitative assays, studies involving image interpretation have an inherent subjective component.Two types of imaging studies are proposed in this application: (i) colposcopy with high-resolution anoscopy (Projects 2, 3, 4) and (ii) histology (Projects 1, 2, 4).In both instances, scoring of findings is observer-dependent.If this scoring is being done by a limited number of individuals at a single institution, internal consistency can be maintained.However when these studies are being performed at several institutions simultaneously, establishing normalization parameters can be problematic.Storage of images as jpeg compressed files will render study images accessible to all MDP investigators so that cross comparisons can be made and consistent standards of interpretation can be established.Data Management Core Resources and Experience.
Data management services will be provided by the Computer Research Technology Lab (CTRL) at UCLA, a collaborative entity whose explicit purpose is to provide web accessible database backed software solutions to support and enhance clinical and basic science research.CTRL is run under the auspices of the Dean of Research within the School of Medicine and is co-directed by Drs. Christopher Denny and Robert Dennis.CTRL currently employs ten full-time programmers, two systems administrators/analysts and two graphic artists/medical illustrators.Over the last six years, CTRL staff has created a wide range of software applications.As a demonstration of the depth of our commitment and expertise; hardware resources, software development tools and examples of developed applications are enumerated below.
A description of the hardware and software resources within our group that will be devoted to support this MDP follows.In addition, we have included descriptions of web based data systems developed by this group for several other projects.Hardware resources
CTRL maintains 12 multi-CPU Intel processor Linux servers.These resources are organized in a hierarchical fashion.Four of these servers are dedicated to the development of new applications.Once these applications have been thoroughly tested both by CTRL and client, they are transferred to one of eight production application and database servers.Our primary, fully licensed, Oracle8i instance runs on an 8-way Zeon processor unit that has 8 gigabytes of RAM and multiple RAID5 and RAID1 arrays.In addition to the Intel/Linux servers, CTRL owns a Sun Enterprise 6500 that has 20 CPUs and 12 gigabytes of RAM.This server is used for computationally intensive projects where parallel processing is possible.Other Sun equipment owned and operated by CTRL include an enterprise 3500 (4 CPU) server, and four SunBlade100 workstations.Each developer has a PIII or P4 workstation and monitor.Our graphic designers, medical illustrator, and digital media specialist use both Macs and PCs and they have one of each at their desks.CTRL has two digital video cameras and two high-end Dell Dimension 8200 computers for video processing.For video work, we also have a Philips DVDR925 encoder.
In addition to computing hardware, CTRL has extensive network resources.We own and maintain a gigabit Ethernet network of single and multi-mode fiber to seven different buildings in the David Geffen School of Medicine at UCLA.All network equipment is Cisco brand and includes two 6500 series switches with switched fabric modules and routers, a 4000 series switch, and nine 3500 series switches.Our high-speed research network is connected to the campus Gigabit ring that is in turn connected to both the commodity Internet through a shared 100-megabit uplink and to Internet two research extranet.Software development tools.
Though we are adept in a number of programming languages, most CTRL applications are developed using an open source, highly extensible framework described by the Open Architecture Community System (OpenACS).The OpenACS is a community developed and maintained descendent of the Arsdigita Community System.It is an advanced toolkit for building scalable, community-oriented sophisticated web sites, web applications, and web services.OpenACS includes a comprehensive set of integrated software components that support rapid deployment.While much can be done with a stock OpenACS installation, its greatest strength has been its flexibility and extensibility through the addition of packages.Our Survey Builder is one such package that extends the functionality of the OpenACS.
OpenACS runs on the AOLserver, an open-source multi-threaded and fast http application server.AOLserver has C and Tcl application programming interfaces (APIs) that allow for rapid application development and the construction of dynamic pages that can query a database.AOLserver manages pools of database connections efficiently and rapidly, making it one of the most flexible application servers available.It is the backbone of some of the largest and busiest production environments in the world.There are database drivers for nearly every well-known relational database management system (RDBMS).
The OpenACS relies on a RDBMS, and the two that it works immediately with are Oracle and PostgreSQL.Most CTRL applications to date have used Oracle and we currently have adequate site licenses to continue this practice.However should we choose to develop completely open source applications, PostgreSQL would be an optimal RDBMS choice.
Like Oracle, PostgreSQL is a true ACID-compliant RDBMS and the most advanced open source RDBMS available today (http://www.postgresql.org).ACID-compliance refers to an industry standard of a databases´ capability to faithfully process complex transactions.PostgresSQL, developed originally in the UC Berkeley computer science department, pioneered many of the object-relational concepts now becoming available in some commercial databases.It provides SQL92 language support, transaction integrity, efficient locking optimized for web applications, and type extensibility.Developed applications.
Given our past experience with the OpenACS/Oracle systems as well as additional mature software tools such as Survey Builder, we are confident that we will meet the data management needs of MDP investigators.As a demonstration of this expertise and to illustrate the range of applications that are possible within this development framework, a small selection of CTRL developed web based applications are presented:
The "Improving Access, Counseling and Treatment for Californians with Prostate Cancer" (IMPACT) program is a state-funded initiative to provide high quality prostate treatment to low income California men who have little or no insurance.IMPACT is the largest effort of its kind in the U.S., and it is unique in at least two specific areas:(i) the central role of nurse case managers (NCM) as the brokers and coordinators of all care provided to patients; (ii) the Internet-based database-back web application that handles every aspect of this multifaceted, multi-center, multi-institution healthcare program.The suite of database applications developed by CTRL for this state-wide program include modules for:(i) determining patient eligibility and enrollment tracking; (ii) patient-centered outreach and phone contact logging; (iii) NCM-managed data collection and reporting (e.g., co-morbidity questionnaire, initial psycho-social survey, ongoing patient status updates, etc.); (iv) provider application and treatment plan and invoicing; (v) program evaluation surveys and phone-based interviews for use with consenting patients.The entire statewide project is electronic and web accessible.All parties involved with the management, delivery, or administration of the care given to IMPACT patients share relevant information via a patient notes module.
The IMPACT patient data application suite is an enterprise-level database in complexity and scope that could not have been developed in the short time frame allowed without the advantage of our OpenACS and Survey Builder toolkit. It clearly demonstrates our ability to effectively manage large clinical data sets as well as our capability to meet the clinical management needs of this MDP program.
The UCLA Jonsson Comprehensive Cancer Center´s Gene Expression Core (GEC) was created as a central resource at UCLA for microarray hybridization and subsequent data management.Prospective oligonucleotide microarray (Affymetrix) users access core facilities through an Internet website.Using Survey Builder information regarding descriptions of new experiments is solicited by an online questionnaire that is context sensitive - user responses prompt subsequent queries.The questionnaires are conditionally driven in that how users answer the initial queries dictates what further questions are prompted.The use of controlled vocabularies allows for comprehensive searches of experimental descriptions and increases the chances that users will be able to find their data long after the experiments have been performed.At the end of the experiment description survey, users are given unique ID numbers and labels to place on their sample tubes that are then taken to the microarray core lab.The experiment description information is immediately available to the core microarray technicians.Custom software tracks the processing of samples and calculates appropriate fees at each step of service.Both experimental descriptions and micro array expression data are centrally stored in an Oracle relational database administered by CTRL.
This GEC application suite was developed using the OpenACS but the requirements of this project necessitated new features and functionality be added to our basic toolkit.In effect the GEC project allowed us to fold back into our toolkit generalized mechanisms that furtherextend our capabilities and have since been implemented in subsequent projects.It clearly shows our ability to track and manage user entered data with complex machine generated data sets.
Our most recent development project is another Jonsson Comprehensive Cancer Center informatics effort.We have developed a web-based Tissue Microarray data management and mining application (TMAtrix).TMA provides a means for rapid, very large-scale molecular analysis of thousands of tissue specimens with thousands of probes for various DNA, RNA and protein targets.TMA technology has only recently emerged, and many research units are struggling with the problems of managing the vast amounts of data associated with tissue arrays.TMAtrix is an open source solution that has been especially designed and engineered to address the requirements of a core unit that conducts TMA construction and analysis.
Instead of using our OpenACS toolkit, we decided to develop TMAtrix using industry standard Java technology to make it easier for others to adopt this software.This clearly demonstrates that we are adept at software development using a number of platforms and that we can tailor our development approach to best meet the needs of individual investigators.Furthermore, TMA already contains a tissue tracking components that could give us a jump start towards developing the Common Resource Procurement and Tracking system that will be needed by this MDP program (see above).Finally, in collaboration with Aperio Technologies (http://www.aperio.com/home.asp) it led us to develop an interface to commercial level advanced tools to manage complex histologic images over the web.
Features (histology and gene array) from both of the last 2 Cores described will be integrated and utilized with the MDP Project (Projects 1, 2, 4).Biostatistical Core Resources and Functions.
The primary function of the Biostatistics Core is to provide oversight for all projects, ensuring that the same high level of statistical expertise is available to all investigators.Smaller projects, ones that do not have need of a dedicated statistician; will be supported by the Core.The Core will also provide a backup to the statisticians who are associated with a particular project.A wide range of statistical expertise will be needed for all the projects given the diversity of the investigations, and the Core will provide support as needed for each project.This support will include survey design, sample size and power calculation, simulation studies, statistical input into trial monitoring (including data safety analyses), modeling longitudinally collected data, and randomization schemes.The Core will collaborate with investigators and statisticians in the analysis of data from each project and in developing methods for cross-linking data from different projects.To facilitate this, the Core will make certain that the data collection and management schemes are compatible, and will allow efficient linkage of data from the different projects. The Core, in conjunction with Regulatory Core B, will also assist in the statistical aspect (both data and safety) of all interim and final DSMB reports
Stratification is an important part of all clinical research, yet the choice of which variables to stratify on and how many to use is often done solely by the principal investigators of a study.Having a centralized core will allow coordination of stratification variables, and involve the investigators in the decisions regarding stratification before the data are collected.Different variables used for stratification and different definitions of groups will make linking data from different studies extremely difficult, and will severely limit the potential analyses that can be made with the full data.
Randomization in clinical studies is always best done under the control of one person or office.This is particularly important in blinded/partially-blinded studies (Projects 3, 4), and decisions regarding unblinding need to be made centrally.A centralized Core is the best and most efficient way to handle all the randomization needed for the different projects.Related to this are decisions regarding early termination of a study, especially due to safety reasons (Project 4).These decisions need to be coordinated between studies, and interim analyses should be done in a way that does not compromise combining data from studies.Decisions to terminate a study need to be made centrally, and should be made upon advice from the Biostatistics Core.
Nearly all studies encounter missing data either from missed visits, partially answered questionnaires, or dropouts in mid-study.It is common to use imputation methods to fill in partially missing data, and there are a number of techniques available for this.Improper handling of imputation for missing data can lead to severe biases, and different methods used in different studies will make linkage nearly impossible.Discussions led by the Core, and a unified approach to handling missing data will be essential to having compatible data sets.
Statistical modeling for repeated measures has even more methods available.A common model uses a mixture of random and fixed effects.Recent research has made Bayesian methods a common choice for estimation and testing, including Monte Carlo Markov Chain estimation, but other commonly used methods for analysis of longitudinal data include Generalized Estimating Equations and bootstrap estimation.The choice becomes important when one encounters non-normal data, or non-standard covariance structures.Semiparametric modeling and nonparametric support are sometimes used for nonnormal data.Clearly, coordination of effort and some common assumptions for modeling are needed across projects.
The analysis of data from AIDS research has presented some difficult problems, and promoted much research by statisticians on methods to solve these problems.Non-standard methods must be developed, often leading to publication of the methodology in purely statistical journals.The development function of the Biostatistics Core would occur when a problem arises from an investigator which requires the development of new biostatistical methodology.The Core will facilitate interaction between the different statistical personnel and the investigators of the different projects.A key function will be to encourage collaborative research among the statistical personnel to solve biostatistical problems that arise during the studies.For these reasons, the biostatistical portion of the Core will be represented on all teleconferences involving safety and data review anda member of the Executive Committee of the Administrative Core (as is the Core Director, Dr. Denny). Every biostatistician is expected to develop new statistical methodologies as part of his/her research agenda.The expectation is that problems encountered during collaborative research will lead to new statistics publications as research is done to answer questions naturally generates new statistical methods to handle the problems.Without support from the Core, it is unlikely that the statistical methods developed would be of use to AIDS researchers.