Healthcare organizations aggregate petabytes of data which help drive life-changing health and business decisions. To avoid errors and redundancies, it’s imperative that this data — which includes everything from patient records to medical charts, employee time sheets, hospital expenses and more — is processed quickly and accurately from one central location.
A large, not-for-profit healthcare organization charged Six Feet Up with improving its existing Extract, Transform and Load (ETL) pipeline. At the beginning of the engagement, the existing ETL pipeline took approximately 24 hours to process data. In their high-paced environment, an entire day was too long, leaving the healthcare organization with little to no time to troubleshoot issues and/or rerun the data before violating the service-level agreements. To reduce the pipeline’s runtime and ensure scalability, the in-house development team needed additional Python expertise.
Six Feet Up’s big data challenge:
The sheer volume of data being processed by the healthcare organization’s existing ETL pipeline (approximately 1.7 million petabytes daily) — which was built in-house — continues to grow at an exponential rate.
When working with big data, Six Feet Up’s team of expert developers use the motto, “The larger the database, the tighter the code.” Ensuring the code is as clean and tight as possible reduces runtime and minimizes the potential for human error.
Using open source technologies and leveraging existing tools’ features, Six Feet Up consolidated the code base to standardize libraries, eliminate one-off data notebooks and remove tens of thousands of lines of repetitive code.
To complete this project, Six Feet Up:
Throughout the implementation process, Six Feet Up provided the in-house development team with valuable support and knowledge so they could continue optimizing and improving the pipeline as more data is accumulated. The pipeline’s ability to easily scale up allows for additional data analysis and more accurate health and business information for the healthcare organization to act upon.
Six Feet Up — in collaboration with the healthcare organization’s internal development team — has built a plan that will reduce the ETL pipeline’s runtime by 20 hours (from 24 hours to 4 hours) for all but the largest pipelines. This extra time will give the healthcare organization the opportunity to troubleshoot and resolve any unforeseen issues before the systems development life cycle (SDLC) ends.
Additionally, the fully functional and maintainable pipeline will provide greater data visibility for the healthcare company’s data acquisition teams and drastically reduce computing costs.
Today, this pipeline is being used to provide accurate and complete datasets which allow the organization to make critical business and health-related decisions.