Show HN: Orchest – Data Science Pipelines Hello Hacker News! We are Rick & Yannick from Orchest (https://www.orchest.io - https://ift.tt/2XRxxBc). We're building a visual pipeline tool for data scientists. The tool can be considered to be high-code because you write your own Python/R notebooks and scripts, but we manage the underlying infrastructure to make it 'just work™'. You can think of it as a simplified version of Kubeflow. We created Orchest to free data scientists from the tedious engineering related tasks of their job. Similar to how companies like Netflix, Uber and Booking.com support their data scientists with internal tooling and frameworks to increase productivity. When we worked as data scientists ourselves we noticed how heavily we had to depend on our software engineering skills to perform all kinds of tasks. From configuring cloud instances for distributed training, to optimizing the networking and storage for processing large amounts of data. We believe data scientists should be able to focus on the data and the domain specific challenges. Today we are just at the very beginning of making better tooling available for data science and are launching our GitHub project that will give enhanced pipelining abilities to data scientists using the PyData/R stack, with deep integration of Jupyter Notebooks. Currently Orchest supports: 1) visually and interactively editing a pipeline that is represented using a simple JSON schema; 2) running remote container based kernels through the Jupyter Enterprise Gateway integration; 3) scheduling experiments by launching parameterized pipelines on top of our Celery task scheduler; 4) configuring local and remote data sources to separate code versioning from the data passing through your pipelines. We are here to learn and get feedback from the community. As youngsters we don't have all the answers and are always looking to improve. August 12, 2020 at 02:24PM
Kommentare