Liberating Data with the Delta Sharing Protocol
Data lives in many places and in many formats. Physically moving data can be burdensome on time and governance efforts of the organization. Enter the Delta data sharing protocol that can fetch the data you need from Incorta from your client of choice.
Register to learn how to navigate your Incorta cloud instance and use it to:
-
Load data and leverage Incorta intelligent ingestion
-
Connect to Incorta from your python client via the Delta sharing protocol
-
Overview common commands to work with the delta share server
Transcript
Joe Miller: Thank you, everybody for joining us today, I see people are filtering into the webinar.
Joe Miller: we're going to go ahead and give it a minute or so before we get started, so please hang tight.
Joe Miller: If you just joined us, we want to thank you for joining today's session will go ahead and give it may be another 45 seconds or so and we'll go ahead and kick off the session.
Joe Miller: Okay.
Joe Miller: looks like everyone is starting to getting into the room there, I see a few more crept in but we'll go ahead and get the session started just to make sure that we have time today.
Joe Miller: Just a little housekeeping before we get began the session here today, by default, everyone is muted Upon entry of the session, make sure that we can hear each other.
Joe Miller: But if you do have questions throughout the duration of the session, please go ahead and enter them into the Q amp a box that's available here in the zoom webinar.
Joe Miller: Okay.
Joe Miller: So i'd like you to welcome you all to today's webinar delivering data with the delta sharing protocol, my name is Joe Miller and i'm the senior director of Community and customer education here today and i'll be your moderator here today i'd like to introduce our speakers.
Joe Miller: with us today, we have.
Joe Miller: Sam and off net but Sam is a software engineer in the quarter cto office, and then the last few years, his enthusiasm has grown towards data science and machine learning, with a focus on big data utilizing Apache spark.
Joe Miller: me to explore his passion for software and data engineering but soon joined in quarter has been focusing since then on the intricacies of data pipelines on the cloud.
Joe Miller: And for us man, he is a cloud platform and engineer manager engineering manager and quarter he works this team to build services and features on top of leading cloud providers, with a focus on building scalable cloud platform for quarter using Cooper nettie and cloud native technologies.
Joe Miller: Without further ado i'd like to introduce our speakers off now would you go ahead and kick us off.
Ahmad Gaber: Today we'll talk about initiating how we can incorporate quantity using it.
Ahmad Gaber: So.
Ahmad Gaber: let's first talk about the problem and.
Ahmad Gaber: As you can see, nowadays, a lot of systems in any organization, a lot of tools to come to each other, was different codes so, for example, if we have.
Ahmad Gaber: Like one department to want to share data with us on our management or another application, we have a different tools for example talk hello, using the first call https so we always disagree.
Ahmad Gaber: or energetic see drivers, so this creates like a complexity of how can we share a data and there's a security, the answer to how we would track.
Ahmad Gaber: and audit our data access across this is the from systems so here the sharing sofas this problem, so that the sharing is assemble rest calls that securely share access data between.
Ahmad Gaber: different partners and it leverages modern cloud technologies and strategist systems like s3 and the list and gcs you can use it over your etheric.
Ahmad Gaber: storage, as I said, like gcs on on the illness and also you can integrate it with any different systems that understands the data sharing.
Ahmad Gaber: and also the sharing is as it depend on the object storage in the cloud provider so many what was the brakes on your end which you provide a lot of features related to.
Ahmad Gaber: Security long time because the access on go who are we can I access later and also, as we know, the object storage nowadays is a scalable platform to store limited data so let's talk about more details about how it works in court.
Ahmad Gaber: So as i've met so.
Joe Miller: Just a quick note, we had a few people in chat saying it's a little softer a little quiet, so if you could just get a little closer to the microphone that'd be great appreciate it.
Joe Miller: Thank you.
Ahmad Gaber: Okay, so as in quarters data provider data sharing, you share existing data in quarter.
Ahmad Gaber: Like sharing the tricky part of it.
Ahmad Gaber: So if you start out your data in quarter, you can show you data with any data systems that understands with the sharing code like tableau or spark or pandas, and so the fairest deserve the client will request to access the data.
Ahmad Gaber: For example, to a is.
Ahmad Gaber: For example, spark I need to read table called sales in our quarter, so the client will send the request to our our data sharing server the server will check your permission on your access token.
Ahmad Gaber: to validate your certification and the access permission if everything goes Okay, since this server we're returning back to the client the software if you're else sends a client can talk directly to encode the cloud storage and to reads the data, so this, how can we.
Ahmad Gaber: Share specific part of the rhetoric and how we can.
Ahmad Gaber: track is access and.
Ahmad Gaber: unsure the data between another system that can read data from important.
Ahmad Gaber: So says so there's a lot of benefits to use this code, as I said, it's Open Source so many, many boxes and many communities will implemented this code to read data.
Ahmad Gaber: was different systems also as we didn't owns object storage, so this we can we can consider this solution is a multi cloud, we can do is either less and Sri and gcs Kansas acute will be.
Ahmad Gaber: solving the problems of provider itself by using a pre science your is so many is that if the shade allow you to easy government debt on track and any access your data between the systems so before we go through the DEMO.
Ahmad Gaber: Also on dimensions.
Ahmad Gaber: The Community behind around disclosure now where you can use it to any Open Source clients can understand the correct like spark and vendors and a lot of commercial client we implement this.
Ahmad Gaber: And so, before going through the DEMO.
Ahmad Gaber: We have like it would be better to understand that certification of code and how we can use data and call.
Ahmad Gaber: Always insides this whole thing's a concept itself so first one is to share the share is a logical grouping to share with the sharing climb in quarter it's.
Ahmad Gaber: You can consider associate is your input, the cloud cluster and the schema is a logical grouping with Dave on that can contain multiple table in inside one scheme and and the table, of course, it would be the trick they were overview.
Ahmad Gaber: Second 1055 the offer file is like, as you some files that contains authentication information required, or is the delta sharing server to syndicate you as a user to read your data that allowed.
Ahmad Gaber: allowed that you can read from the data owner so and also series mean a ministerial is that you can be used.
Ahmad Gaber: As this principle is just to the data, so you can list as a share is the schema tables and the you can call sorry data for a specific portion of table in different, and now I I assume can go in with.
Basem Gaber: OK, so now we'll be going to that you'd be seeing how we could be accessing the data that it's just a word can share with us, through our Python client.
Basem Gaber: For this they will be using to kind of notebooks and the first would be Jupiter notebook on vs code ID and the next, I will be heading over to an e book on the topics in Burma, however, this could be even used with any other kind of the book, such as, for example.
Basem Gaber: Please note that to get the surface of inquiry sharing this must be enabled by contacting our support team Okay, so the first item, I have here in my notebook is just to install the Python we have the inquiry the shading library again this library codes of the ap is.
Basem Gaber: That can be used by the client to communicate without the shedding serves.
Basem Gaber: Plus, also be installing the methods library, which would be using them to plug some shorts using the data that we had noted from our side, OK, so the next step that heavy is to just input, the library from the package that I just installed.
Basem Gaber: go ahead and do that Okay, and then next I will be defining the name or the full path of my professional again, as I mentioned earlier, the profile is the fight that holds all the credentials that are needed.
Basem Gaber: While we are communicating with the server to authenticate.
Basem Gaber: notice even have a deeper look here over its contents so first, for example, we have here the endpoint over which the server is listening on and then next we have the API key of our incorrect cluster and we even mentioned the instance name and then, finally, that entertaining.
Basem Gaber: need me go over to the cloud platform of encoder to see how we can access this API key so he I mean our cloud platform, and I could access, one of my classes, that I had already created out of time, and then I need to go over to the analytics y'all and from there.
Basem Gaber: I could go over to the security tab.
Basem Gaber: In my account.
Basem Gaber: And I would find the API key that I need to be using it could be a copy that anytime and, of course, it could be renewed or revoked if needed.
Basem Gaber: Okay, so let us get back on.
Basem Gaber: Okay, so after we had defined the birth of our fight file, I will then be creating venkatesh a complaint object being essentially eating it using the profile.
Basem Gaber: And then, this object will be used to call all the methods or functions of detox So the first thing we have is this tenants, which just lists all the tenants that I have inside my cluster.
Basem Gaber: So let us just give it a couple of seconds.
Basem Gaber: Of course, the 10th here I leave my default tenant, of course, when I access my analytics i'm inside one of my tenants here, for example, I can say this different than OK, so the next be using next deploying which is.
Basem Gaber: A list.
Basem Gaber: Of the schema inside the tenant of question, I will be using here the default tenant police or the schema inside and I can see that the only have one schema, which is a monster.
Basem Gaber: As we can see it from there you are okay so next we will be listing all the tables inside the schema so here, I can see that I have my set or the table my bill materials, those are the header and so on.
Basem Gaber: Of course, all these tables need to be already ahead of time accepted and know the inside in quarter, so I had already ahead of time already.
Basem Gaber: Created definition of the schema and external no the Code, the data, of course, if I had even other schemes and loaded the data set in kota they would have shown has been here.
Basem Gaber: The other point that I need to mention is that.
Basem Gaber: The data, if I can access through the data sharing a client I have provided, if I have my user, and for that my user must already have access to the data so here in our example they use it i'm using is a schema management, so he has access to all the tables that i'm seeing here in this.
Basem Gaber: Okay, so next we will be actually loading tables are actually building table quoting it from the server.
Basem Gaber: So the first if all you had a few is no the Spanish which just loads an entire table in Japan is that a frame, we could use another way another very simple another very similar going, which is no the spark which does the exact same thing, but loads it into aspect, the different okay.
Basem Gaber: Let me explain what is this parameter, which is given to the Lotus fantasy API so we just have to provide.
Basem Gaber: That profile file again Plus we have to win the fully qualified name of our table so first we have that tenant name and then the schema name and then, finally, we have the evening.
Basem Gaber: If any after loading my table independence that the frame I just use the head function, so that I could just show the first five groups.
Basem Gaber: So the finance in here is that I would just be using the method library to block some shots so I grew up over colon which is category and then I just i'm displaying here, what is the distribution of the profits across my different.
Basem Gaber: OK, so now let us head over to the other notebook here on data bricks environment, I will be at the beginning, though, repeating the same.
Basem Gaber: steps, and then we will be investigating with other use case so again, are you need to install my library, I already ahead of time uploading it to the data bricks for a store, so let us just be a couple of seconds.
Basem Gaber: Okay, so it had completed, and now I will go ahead and import the library from my package, and then I would define the full path of my profile again as well, I have uploaded.
Basem Gaber: My credentials file, which shows the json objects into that the biggest file system.
Basem Gaber: Okay, so the next we will be in Austin cheating our client objects to be using next Okay, so it will be listing occurrence in just seeing the different tenants.
Basem Gaber: And then listening the schema inside is an entity is the online store schema and then listing the tables retrieves for us all the tables inside our schema.
Basem Gaber: And then, again, I will be loading a specific David into this data frame and just showing up for being with the first five rows and then again.
Basem Gaber: i'll be looking just the same pie chart that we had on vs code okay so next we will be doing some additional use cases so here, for example, we will be using load a spark.
Basem Gaber: Of course, here I don't need to install it for myself or configure it since Facebook already has spurred coming on it, so I will be just loading the same they will be using before which is monthly revenue scale into iceberg data frame and then i'll be showing my data frame.
Basem Gaber: So it is just give it a couple of seconds.
Basem Gaber: Okay, so here's it, I can see that i'm seeing just the executive that have been seeing earlier on the difference that this was a translator frame and.
Basem Gaber: This is a spark the different okay so next, let us do what else we could do just instead of simply loading tables, let us see if we could do some joins between tables.
Basem Gaber: So now i'm going to be loading another favorite from a schema it's called since order detail i'll be loading it into a pen is that a frame.
Basem Gaber: So maybe even showing just the first five rows okay so here's it and then next I will be loading and other different table into a Panda Center frame as well, so the other table he scored product.
Basem Gaber: And then i'm aware that there is a possible join between those two tables using the predicate, which is the current goal for the product ID.
Basem Gaber: So I will be using the merge function from the pencil frame to perform a joint between those two tables should be a left join and then after design has been computed I will be showing the first five rows of the output.
Basem Gaber: So it was just a couple of seconds, and it was it I could see here at the header that all the acronyms and seeing are both the column sense from.
Basem Gaber: Both of my tables chairs or detail and product Okay, so that would be repeating the joint using spark that assumes instead.
Basem Gaber: So, first, I will be loading the sales or the detail into as part of the fame and the showing it so here's it and the next idea loading the other table, which is the product table.
Basem Gaber: into another sprung data frame So where is it and then, finally, we will be executing the actual joint using the joint function that spark offers and again we'll be doing a left join using own over the product ID.
Basem Gaber: So let me execute this Sir.
Basem Gaber: Okay So here we can see it better and, of course, at this point that the data resides inside the memory of our number, so this could allow for any other possible use cases.
Basem Gaber: We could do any any types of groups or filters are generations and you could do it and use the data for data science or for machine.
Basem Gaber: Okay, so, for example, one thing I will be doing here is that I would be selecting from the the product name corner from my joint output.
Basem Gaber: And then i'll be grouping using the product name and then over the groups are becoming and showing the results of the counts per product so as if we're just seeing which product has been giving me my greatest Count of seeds for master.
Basem Gaber: Okay, so I guess that was it, let me know had back to Joe.
Joe Miller: If anyone has any questions from the session or one person to extrapolate more we're more than happy to.
Joe Miller: One thing that I was curious about the stem you talked about prerequisites to getting delta sharing stood up like if you're in a cloud instance, make sure that you contact support.
Joe Miller: What other steps do you need to do to make sure that you're set up to start activating this particular feature set.
Basem Gaber: Can you please repeat the first part of the question, I believe I missed it.
Joe Miller: Oh, I apologize, so we talked a little bit about that about the fact that in the cloud environment, you have to contact support to activate it.
Joe Miller: One other prerequisites should our users here bring to the table and make sure that they're set up to use this feature an example would be like that json file that you showed authentication like the the API key where would users go and find that, once their setup.
Basem Gaber: yeah so.
Joe Miller: You can get this you can get this.
Ahmad Gaber: API key from your profile Beijing quarter and then you can set up a profile that's essential to you.
Ahmad Gaber: also want to mention that this only available in the cloud for integration from.
Ahmad Gaber: 2022 and above so you can able to use the trainings this cluster and, of course, you have you must have data clients that understands the sharing cool like data bricks for baptisms for.
Joe Miller: Great.
Joe Miller: Thank you.
Joe Miller: Okay, I will give it about 30 more seconds to see if there's any questions that are coming through chat.
Joe Miller: or Q amp a and actually while we are doing that i'm going to go ahead and share my screen, just so I can talk through a few of our end slides here.
Joe Miller: here's what i've lost my slides old one.
Joe Miller: Okay.
Joe Miller: So i'm.
Joe Miller: One of the things i'd like to make mention of is, if you found this session useful or you'd like to learn a little bit more about the session.
Joe Miller: The seminar NED We talked a little bit before this call about getting some of this information into a knowledge base on the Community so.
Joe Miller: Some of the scripts that were provided here today, I will make sure that we democratize and push out into our knowledge base environment so stay tuned for that.
Joe Miller: In addition to that, we will also be posting up this recording on in quarter.com for you available on playback.
Joe Miller: But in the interim, if you are interested in diving a little bit more into this particular feature set.
Joe Miller: Please go ahead and join our community at Community quarter calm there you can go ahead and ask questions we have people like myself engineers product management kind of perusing the forums.
Joe Miller: We have a place to submit ideas that you have used the product enough and find that there's some areas that you'd like to see bolstered for your day to day use.
Joe Miller: We also have, as I mentioned knowledge base articles which will be posting after the event in the next few days here.
Joe Miller: The other thing that i'll make a mention of is, we also have an events page if you're interested in attending future webinars, whether they be more technical ones like this one.
Joe Miller: Some of our virtual hands on labs that are more training oriented, wherever they may be, you can find the entire event listing there as well.
Joe Miller: With with that I do want to thank everyone here i'm going to do a quick chat Q amp a view see if there's anything.
Joe Miller: coming in, nothing else, so I guess, I will close the session there, I want to thank everybody for joining this quick action on insights session today.
Joe Miller: Hopefully you found this insightful um I did get one question here in the 11th hour llama throw back to you ask matter percent and it's in the chat window, can the client be in from on Prem.
Ahmad Gaber: Yes, as you see, less than one shows a DEMO spin up quarter cloud cluster and the user to own that over to you to read the data from.
Ahmad Gaber: From in court, using these books so.
Joe Miller: Another question just came in and is it available to be used with s3 yes, the.
Ahmad Gaber: sharing it is mainly depend on object storage, so any cloud provider support object storage like s3.
Ahmad Gaber: or some delays in Asia GCSE gcb with the.
Joe Miller: Another question came in great we're getting all the questions now.
Joe Miller: On Prem to cloud is facilitated by the data agent now so should we have the agent to be in place.
Ahmad Gaber: And know.
Ahmad Gaber: Emily in his agent is to take the data from outside in court as a data source itself been loaded into in court, but here is a it's a different direction, you will need to share specific tables with other users over other systems, so no need for that agent in this case.
Joe Miller: let's give it 30 more seconds sense to spend some good questions coming in.
Joe Miller: Okay.
Joe Miller: Thank you everyone for joining me here today, I appreciate your time and learning a little bit about this capability.
Joe Miller: Again session will be recorded on the website over the next week and we'll get some knowledge base articles out to share some of the scripts with you that were shared here today with that, I would like to close the session, and thank you all for joining today, thank you.
Hosted by:
Basem Gaber
Software Engineer
Ahmad Gaber
Engineering Manager
Joe Miller
Senior Director, Community and Customer Enablement