Sharing data across multiple platforms is a cornerstone of any data pipeline. On the cloud, this operation could be a little bit too tricky. With that in mind, Incorta constructed its Delta Sharing Service to guarantee flawless secure access to data on the cloud. Now, accessing the data needed for an effective and reliable data pipeline is just a few seconds away.

 

Transcript:

The potential benefits of such decision include cost ease of access and efficiency. For example, the way Incorta provides those added benefits is through storing data on cloud using data lake, which is a metadata layer over the parquet files. Among the data lake features is supporting asset transactions, which guarantees concurrent reads and writes. To make sure the data can be accessed scurely, Incorta makes use of data sharing service, which will be the main focus of the remainder of this talk. Whenever any client queries the data by communicating with Incorta, data sharing secure service, we provide the client with pre signed short-lived URLs, which grant access to Incorta's Delta league tables that are stored on Incorta cloud. Since that, this system is evolving powerfully and multiple platforms are getting into the mix.

This guarantees that this delta sharing Client could be anyone, anywhere and is to be serviced at anytime. Now, I will be showing a demo using a notebook where I will be loading the data that Incorta has already loaded and stored. The whole process could take just a couple of minutes and Incorta's data becomes accessible from any platform. The Notebook, I'll be using runs on _________. Nonetheless, this could be done as well for many other notebooks such as Jupyter, Zeppelin or Google Play. Okay, so the first step that is needed is to install the Python wheel, which holds the code of the Incorta testing library. After the installation is completed, we can go ahead and import the library.

And then we need to define the path of that profile file. That profile file is just simply contains a bunch of credentials that are needed for the Incorta delta library to do that authentication and authorization. For example, I just has the endpoint of the delta sharing server and the server is__________, the instance name and the tenant name, the API key can be easily obtained from inside the Incorta UI from going over to the Security tab, we can find that it could be copied and renewed data. Okay, so back to the notebook. So as we saw the key existed inside the credential ship by the API key Incorta tease that access is only granted to the Authenticated users over table, which those users are authorized.

Okay, so next, we'll be initiating or creating the Incorta sharing client. And the next I'll be using the_____, which is less tenants that just returns to me the tenants in my cluster. Let's just give it a couple of seconds. Okay, so it has returned to me the tenants. Next I'll be listing the schemas in the default tenant, I have earlier already created the cluster and whenever I create a cluster, Incorta adds some sample data schema, which is called online store. It already comes with some groups and joins as well. Okay, so next I will be listing all the tables in this schema. Okay, so we see that we have that sales order detail, bill of materials, the sales order header and so on. Of course, I could have already if I had loaded another schema into Incorta and it would have been this with me here and I could have listed all the tables and loaded all the tables as well. Okay, so next, we will be loading a table into a panda's data frame using the load as pandas API.

This has to be done using the table_url, which is just the profile file or the path of the profile file, appended by the fully qualified name of the table which isn't the tenant name and then the schema name and then the table name. There is also another similar API which is known as spark which does the exact same thing but loads the data into a spark data frame. Finally, we could see here the data resides in memory as a frame, thus allowing us for any further processing or analytics. For example, I can perform any groups, filters or joining operations, I can plot charts like the pie chart, and they can also use the data for data science and machine learning. Okay, so that was it. Don't forget to check out Incorta's virtual hands on lab series spreading across the first two weeks of June. For more information and to register please do visit
www.incorta.com/virtual-hands-on-lab-series Thank you.

 

Speaker:

Basem Gaber-1

Basem Gaber

Software Engineer

Incorta_logo_black

Interested in partnering with us for next year's event? E-mail sponsors@incorta.com.