High Performance Computing Enhanced Apache Big Data Stack

NIST BIG DATA PUBLIC WORKING GROUP BIG DATA USE CASE SURVEY

Publications:

  1. Judy Qiu, Shantenu Jha, Andre Luckow, and Geoffrey C.Fox, Towards HPC-ABDS: An Initial High-Performance Big Data Stack, in Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data. March 18-21, 2014. San Diego Supercomputer Center, San Diego. http://dsc.soic.indiana.edu/publications/nist-hpc-abds.pdf.
  2. Geoffrey Fox, Judy Qiu, and Shantenu Jha, High Performance High Functionality Big Data Software Stack, in Big Data and Extreme-scale Computing (BDEC). 2014. Fukuoka, Japan. http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdf.
  3. Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, and Geoffrey C. Fox, A Tale of Two Data-Intensive Approaches: Applications, Architectures and Infrastructure, in 3rd International IEEE Congress on Big Data Application and Experience Track. June 27- July 2, 2014. Anchorage, Alaska. http://arxiv.org/abs/1403.1528.
  4. Geoffrey C.Fox, Shantenu Jha, Judy Qiu, and Andre Luckow, Towards an Understanding of Facets and Exemplars of Big Data Applications, in 20 Years of Beowulf: Workshop to Honor Thomas Sterling's 65th Birthday October 14, 2014. Annapolis http://dsc.soic.indiana.edu/publications/OgrePaperv9.pdf
  5. Geoffrey Fox and Wo Chang, Big Data Use Cases and Requirements, in 1st Big Data Interoperability Framework Workshop: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data March 18 - 21, 2014. San Diego Supercomputer Center, San Diego. http://dsc.soic.indiana.edu/publications/NISTUseCase.pdf.
  6. NIST Big Data Use Case & Requirements. 2013 [accessed 2015 March 1]; Available from: http://bigdatawg.nist.gov/V1_output_docs.php.
  7. Geoffrey C. Fox, Shantenu Jha, Judy Qiu, and Andre Luckow, Ogres: A Systematic Approach to Big Data Benchmarks, in Big Data and Extreme-scale Computing (BDEC) January 29-30, 2015. Barcelona. http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/OgreFacets.pdf.
  8. Geoffrey C. FOX, Shantenu JHA, Judy QIU, Saliya EKANAYAKE, and Andre LUCKOW, Towards a Comprehensive Set of Big Data Benchmarks. February 15, 2015. http://dsc.soic.indiana.edu/publications/OgreFacetsv9.pdf.
  9. Dan Reed and Jack Dongarra. Exascale Computing and Big Data: The Next Frontier. 2014 [accessed 2015 March 8]; Available from: http://www.netlib.org/utk/people/JackDongarra/PAPERS/Exascale-Reed-Dongarra.pdf.
  10. Shantenu Jha, Andre Luckow, Pradeep Mantha, A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures? 2015. [Online]. Available: http://arxiv.org/abs/1501.05041
  11. Bingjing Zhang, Yang Ruan, and Judy Qiu, Harp: Collective Communication on Hadoop in IEEE International Conference on Cloud Engineering (IC2E). March 9-12, 2015. Tempe AZ. http://dsc.soic.indiana.edu/publications/HarpQiuZhang.pdf.
  12. Geoffrey Fox, Judy Qiu, Shantenu Jha, Supun Kamburugamuve and Andre Luckow, HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack Invited talk at 2nd International Workshop on Scalable Computing For Real-Time Big Data Applications (SCRAMBL'15) at CCGrid2015, the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, held in Shenzhen, Guangdong, China http://dsc.soic.indiana.edu/publications/HPC-ABDSDescribed_final.pdf
  13. Geoffrey Fox, Judy Qiu, Shantenu Jha, Supun Kamburugamuve, and Andre Luckow, Implications of the HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack for workflows, in White paper for DoE NGNS/CS Scientific Workflows Workshop http://extremescaleresearch.labworks.org/. April 20-21 2015. Rockville Md http://dsc.soic.indiana.edu/publications/WorkflowsandHPC-ABDS.pdf.
  14. Saliya Ekanayake, Supun Kamburugamuve and Geoffrey Fox, SPIDAL: High Performance Data Analytics with Java and MPI on Large Multicore HPC Clusters, Technical Report January 5 2016; Proceedings of 24th High Performance Computing Symposium (HPC 2016), April 3-6, 2016, Pasadena, CA, USA as part of the SCS Spring Simulation Multi-Conference (SpringSim'16).
  15. Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage, Geoffrey Fox, Towards High Performance Processing of Streaming Data in Large Data Centers Technical Report January 26 2016, to be published in proceedings of HPBDC 2016 IEEE International Workshop on High-Performance Big Data Computing in conjunction with The 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016), Chicago Hyatt Regency, Chicago, Illinois USA, Friday, May 27th, 2016
  16. Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake, and Supun Kamburugamuve, Big Data, Simulations and HPC Convergence Technical Report January 30 2016. DOI: 10.13140/RG.2.1.1858.8566
  17. Bingjing Zhang, Peng Bo, Judy Qiu, High Performance LDA through Collective Model Communication Optimization, Proceedings of International Conference on Computational Science (ICCS2016) conference, June 6-8, 2016, San Diego, California.
  18. Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake, Supun Kamburugamuve, White Paper: Big Data, Simulations and HPC Convergence, Technical Report May 20 2016 DOI. Presented at BDEC Frankfurt workshop June 16 2016.
  19. Bingjing Zhang, Peng Bo, Judy Qiu, Model Data-Centric Computation Abstractions in Machine Learning Applications, in 3rd Workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR2016), held in conjunction with SIGMOD/PODS2016, July 1, 2016.
  20. Saliya Ekanayake, Supun Kamburugamuve, Pulasthi Wickramasinghe, Geoffrey Charles Fox, "Java Thread and Process Performance for Parallel Machine Learning on Multicore HPC Clusters", Technical Report August 8 2016, DOI
  21. Project: Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science SPIDAL NSF14-43054

Kaleidoscope diagram