Incorta for Data Scientists
Incorta provides data scientists the support of their machine learning (ML) life cycle. This session walks you through these tools and demonstrate how Incorta can be used.
Watch now to learn:
- How Incorta supports the ML cycle
- How to perform exploratory data analysis with Incorta
- Data preparation, including Missing data and Outlier detection and handling
- Separate ML model creation and inference
- ML model creation using Spark ML, Incorta ML, and Pandas based API
- Business dashboards that can bring the ML prediction together with your business data
Transcript:
Joe Miller: Well Thank you everyone for joining today, my name is john Miller i'm the senior director of Community and customer education here at encarta and i'll be your host today I want to welcome you to the first action on insights webinar of.
Joe Miller: This is our monthly webinar series where we share best practices tips and tricks to help our customers unlock the full potential of the quarter direct data platform.
Joe Miller: Today we're going to learn a little bit about in quarter for data scientists now just a little housekeeping before we begin.
Joe Miller: If we have any questions throughout the duration of the presentation, please go ahead and type them into the Q amp a box at any time we'll do our best to address them throughout the session via text.
Joe Miller: But if there's questions that are outstanding will try to follow up with those at the end of the session
Joe Miller: Now i'd like to introduce our speaker today our speaker is dylan.
Joe Miller: He is the senior solution architect of data, science and machine learning solutions here at in quarter.
Joe Miller: dylan has worked as a solution architect on several encoded implementation projects, including broadcom apple Facebook and comcast.
Joe Miller: Currently, his focuses on creating data science machine learning solutions and business blueprints based on in quarter.
Joe Miller: Before joining in Korea he worked for in court, a bi application and Oracle EBS with various roles, including.
Joe Miller: Development management architecture and product management so needless to say, we have brought you the best person to speak about data science and in quarter to you all here today without further ado, I want to hand it over to dylan to go ahead and launch us into our session.
Dylan Wan: I am thanks Joe.
Dylan Wan: Thanks for for joining today's action inside.
Dylan Wan: And today i'm going to talk about the.
Dylan Wan: Internet for.
Dylan Wan: For data scientists.
Dylan Wan: But I wanted to mention, first that.
Dylan Wan: In coordinate treating this machine learning and data science by a different persona.
Dylan Wan: On the last year.
Dylan Wan: And Q4 we introduced a feature called one click for forecasting which it over to the business analyst OK, so the idea is that as i've been analyst.
Dylan Wan: With the data which the time series data now you are able to.
Dylan Wan: perform the prediction and forecasting without doing the machine learning model me and changing it basically is a feature embedded in our underwriter and you can produce the.
Dylan Wan: Focus okay it's just by clicking.
Dylan Wan: on the screen key and that's not what i'm going to focus today i'm going to talk about from them, they have scientists and data engineering perspective will use the important dashboard.
Dylan Wan: Because I know book and also will show you how we can enable external notebook to the data API.
Dylan Wan: Okay, before I jumped to the the the detailed technical technical detail, I want to.
Dylan Wan: sync up on the technology use, and I think many of you may already know machine learning so in this presentation wishing that they have science work involved in producing them using the.
Dylan Wan: machine learning reckoning to produce the knowledge and and and and the pattern or Google discovering those from data okay and.
Dylan Wan: Some of the problem, which have been.
Dylan Wan: solved by.
Dylan Wan: machine learning for them, for I get like.
Dylan Wan: an email saying hey this spin basically by using the machine learning algorithm we can using the historical spam, which are previous already manually and they both and and to.
Dylan Wan: to predict to gain the knowledge and we can classify the future email.
Dylan Wan: and other Castle problem is that.
Dylan Wan: Sometimes the.
Dylan Wan: New numeric.
Dylan Wan: Data we can use the historical they have for doing.
Dylan Wan: prediction, one of the example is the House price based on previous these are sold houses, we can determine the rule or making it discovered.
Dylan Wan: And this type of problem people corporate expectation and I will people in machine learning work order regression problem, then I will use this term.
Dylan Wan: The presentation and the for the for the knowledge and pattern which discover its core machine learning model and machine machine learning model can apply to the data and the to produce the either prediction over score score data.
Dylan Wan: For the input data.
Dylan Wan: When the data coming into machine anymore, they are data which consider table.
Dylan Wan: Which are the target for prediction, which can be either numeric or or categorical.
Dylan Wan: And for those data which are useful for for prediction at the core features and then we'll use this turn to other condition.
Dylan Wan: And they also are used to turn observation to refer to the big data.
Dylan Wan: Okay um this session i'm going to go over.
Dylan Wan: How encoder support me life cycle, I will briefly touch the the front of process perspective, and I will use the.
Dylan Wan: I would perform a DEMO.
Dylan Wan: How to do export predatory data analysis within kota.
Dylan Wan: and
Dylan Wan: walk through with you about how we can do they have preparation in quarter, I will pick up some of the thing which.
Dylan Wan: can do in in the Info tab by handling missing missing data a liar.
Dylan Wan: And finally, I will look through the DEMO for encoder using for that email and aspiring email.
Dylan Wan: This is the process that.
Dylan Wan: This is the machine learning lifecycle and i'm going to go through this we typically starting with.
Dylan Wan: On this understand that business and identify the problem to be solved one example direct.
Dylan Wan: One on to understand why customer walk away Okay, and in this this problem people call the customer trim problem.
Dylan Wan: And to understand that business, you may you may understand humanity know like a part of the definitional term it is that I.
Dylan Wan: Maybe maybe customer don't want to renew the contract or customer stop buying if they are repeating customer Okay, and then.
Dylan Wan: We enlightening we go to understand the data, whether the data would you can be used for discover and Google.
Dylan Wan: And for performing the data.
Dylan Wan: To understand the data in corner can be used for export Rotary data nearest it by using quota dashboard and in a notebook and before they have preparation.
Dylan Wan: can be used for performing.
Dylan Wan: Preparing the data many lead machine learning algorithms are required a data to being in a written format and also in.
Dylan Wan: beta can need to be transformed first.
Dylan Wan: And inquire can be useful this handling they have preparation.
Dylan Wan: Then modeling and evaluation is an integrated process and sometimes i'm.
Dylan Wan: going to show you that we can use encoder me library, so no book and also third party integration to perform this model building, including training and innovation, then.
Dylan Wan: email me or system was the model is if I it is identify the need to deploy into the production system and for.
Dylan Wan: Those those who already familiar with.
Dylan Wan: In quota, you know that you know that that we support a life cycle, by doing the model, the schema environment and also deploy into the production environment.
Dylan Wan: And then the same type of thing happened to me as well, so we can migrate migrate the content from the developer environment to the production environment.
Dylan Wan: And then, if the data is coming from third parties and use the data lake connector to connect to the data and bring the day having to import that so.
Dylan Wan: The the scoring problem inference problem is similar to the problem of we refreshing that data we can think of the mls pad as one of the way to enrich the data.
Dylan Wan: To open the data coming in from the source system and we we refresh with the latest data and then making that the divert for for reporting.
Dylan Wan: And ml from the encoder process perspective is one the source to get the additional prediction, and which will produce the additional insight and so by by performing this ml.
Dylan Wan: influence within in kota, we can also have the integrated dashboard.
Dylan Wan: Over this data exported literary analysis with the encoder using it then has said, from cagle to go to the website hosting competition and they provide the asset, and I will use the data.
Dylan Wan: And now i'm going to show you a DEMO.
Dylan Wan: Okay here, I will first use in the.
Dylan Wan: In quarter dashboard.
Dylan Wan: Here is that they'd have said, with the already create, but I want to show you how to do this Okay, so can they not coming on we after we load the data into encoder Okay, and you can create the encoder encoded in kona dashboard.
Dylan Wan: For they say like house price what I did, is that I actually first, starting with bring a list table.
Dylan Wan: This this table is is is very simple, nothing but just bring all the content into into encoder.
Dylan Wan: The comparing to other experience of previewing and data by using encoder we have some benefit.
Dylan Wan: Like, for example, like this is very easy, you can store the data so let's say I want to show the.
Dylan Wan: Data by.
Dylan Wan: sales price.
Dylan Wan: Okay, if I just click on this, I can so I can.
Dylan Wan: sort of data by sales price.
Dylan Wan: This is price from low to high and do from high to low.
Dylan Wan: And also very easy for me to go to detail about let's say look at the data from forgiven neighborhood so in kota a dashboard is that interactive.
Dylan Wan: environment, so we can perform and there's is to understand your data okay by shorting and by also slice and dice Okay, in this case I drill into the particular neighborhood and we can see.
Dylan Wan: The that the district descriptive statistics about that neighborhood Okay, this is what I is also very simple, are you seeing the encoder KPI we can define.
Dylan Wan: statistic implementation like average medium Max and me.
Dylan Wan: And we know this neighbor who have like a 1441 houses and what is the their.
Dylan Wan: range range of the the the year that look like the have this range is pretty pretty new from the from the.
Dylan Wan: 1992 to the to the.
Dylan Wan: 2010.
Dylan Wan: And the dashboard King helping us to to perform.
Dylan Wan: data analysis as well.
Dylan Wan: Internal like under understanding the relationship between data, so in this chart basically we're showing you that the relationship between price and the year.
Dylan Wan: And so we under we can know that look like the House is.
Dylan Wan: The price is to keep.
Dylan Wan: Increasing over time.
Dylan Wan: Although some of the price in the data is very old and.
Dylan Wan: And then we can also notice by using the.
Dylan Wan: Using a laptop.
Dylan Wan: And here we can see the how the how the building type affecting the sales price books, like the single single family home and and the townhouse have a higher price than others Okay, and the scatter chart scheduled to helping us to to knowing the relationship between two numeric.
Dylan Wan: data in this case will be sales price and not area and we can see kind of I guess the chin.
Dylan Wan: between this this, so the lung area is the larger the larger area, the higher the price.
Dylan Wan: And this chart also helping us to identify outliers so in this, in this case, I can see that look like they are some outlier.
Dylan Wan: The House are having extreme extreme big are not area okay same thing, also for the pro pro score see this is thinking meeting all common sense right and by using this type of exploratory data analysis which is help us to determine the.
Dylan Wan: feature helping us to kind of the feature in helping us to say.
Dylan Wan: hey are nine going to this is from the end quote.
Dylan Wan: Unquote our dashboard.
Dylan Wan: Nine moving to the doing the analysis from.
Dylan Wan: From encoder notebook.
Dylan Wan: Okay, include a notebook allow us to reading the data.
Dylan Wan: From front end quota into spark.
Dylan Wan: And the ones that data in the in.
Dylan Wan: In spark spark API available for us in this environment, so we can doing things like showing the preview data very similar to what I what we did in the dashboard we can show this.
Dylan Wan: schema.
Dylan Wan: structure and.
Dylan Wan: This actually allows us to to see this statistic, so this is our information earlier, we were building this important a dashboard one by one, and by using notebook we can generate this.
Dylan Wan: descriptive statistics for all the column just in 119 Okay, and the similar terrible thing can see, they.
Dylan Wan: heard the word we go, for example, like la area right what's the central tendency, but the bear.
Dylan Wan: The variance.
Dylan Wan: At this another for for machine learning or statistic for this webinar and now going to go through the definition, but I want to mention that this type of information, we can store in.
Dylan Wan: In in kota as a separate table and that's what I what I did okay I basically using this certificate function from.
Dylan Wan: From from wrong spot in the pudding into something that caught a summary table okay and that's summary table can be triggered us in in a downstream process for outlier.
Dylan Wan: Detection okay one thing was managing mentioned, is that in this encode on notebook environment, you are allowed to do the visualization as well.
Dylan Wan: So here I show you a table right and we can actually using the bar chart as well, so in this case, what I what I did, is that we can bring the House style and.
Dylan Wan: And the greener low come by ID by doing so we have a.
Dylan Wan: generation within within notebook which will help us to understand data right we know like.
Dylan Wan: One of the distribution of the of the House in this area, yes, more single first single storage then second.
Dylan Wan: yeah a few other type.
Dylan Wan: And we can do a scatter childhood I showed earlier in a notebook as well, so this kicking help us to quickly understand the data.
Dylan Wan: So, in the case of a scanner jar what I show here is the.
Dylan Wan: relationship between the sales price and garage area.
Dylan Wan: And then we know look like this.
Dylan Wan: Is a trainers, will the larger garage area is the price it higher.
Dylan Wan: Okay, so this is the doing the performing.
Dylan Wan: The interactive analysis we seen no book okay next i'm going to show you is that.
Dylan Wan: What we can do is, we can also use.
Dylan Wan: In kota kota notebook.
Dylan Wan: England Okay, not just supporting the Python here I write the code in the.
Dylan Wan: scholar, and we can use the scar to do to perform sound analysis, and here I am I wrote a really useful co this reuse code because, basically, is helping anyone, if you want to.
Dylan Wan: draw the histogram you're able to do this.
Dylan Wan: Within within pleasing know within within a book from come back to this later Okay, in addition to the.
Dylan Wan: histogram.
Dylan Wan: So.
Dylan Wan: This is another Program.
Dylan Wan: For usable this is producing a table.
Dylan Wan: That it will see this for me Okay, but let me first quickly show you the other night.
Dylan Wan: You can change this to any any any day has said okay i'm using the House price, it has everything changed to a to an anything basically the coin, the.
Dylan Wan: spying on St group to calculate the correlation I will show you a high correlation is important.
Dylan Wan: That we produce the data like this, so a pair of feature okay.
Dylan Wan: And that each of the each other feature we produce a correlation to show how this to feature are related to each other okay.
Dylan Wan: And then, once we produce a table like this hey we can now go back to.
Dylan Wan: To encoder and to.
Dylan Wan: See the relationship.
Dylan Wan: Okay, this is the heat map in in encoder using the data we produce from the trs view.
Dylan Wan: Okay, so why the heat map is important so in.
Dylan Wan: machine learning, we have many feature which can be useful for prediction right, but something we would like to know, like, are there any redundant.
Dylan Wan: Sometimes the feature amanda itself are actually even they have different name but they're there they actually almost the same so in this case, I can see that.
Dylan Wan: there's a feature called garage car Okay, and you see this this this box right isn't in the dark.
Dylan Wan: purple.
Dylan Wan: You see, the garage area and garage car has very high correlation okay What this means, this means that a very lightly, we don't post feature.
Dylan Wan: We may just use one or the one on that 1.1 of the feature if we have multiple feature within within the same meaning he that may be impacting.
Dylan Wan: The models performance Okay, but by using a.
Dylan Wan: schema like this, we can see the the.
Dylan Wan: coordination among features okay.
Dylan Wan: And I think the employees there, we can see how this feature are.
Dylan Wan: native to the label the target okay so here are using the same same data and doing the visualization we've seen in kota you can see, like a sales price is very highly correlated to itself right, but you can see that I like leaving area oboe.
Dylan Wan: oboe quite vacation it actually a good feature for predicting the sales price okay.
Dylan Wan: Okay, so no.
Dylan Wan: No, I think earlier, I tried to go the.
Dylan Wan: Other yeah, so this is another thing I want, I want to show you is that, by using the important component notebook.
Dylan Wan: We can also draw the histogram and so Brendan helping us to understand the distribution and also identify the outlier so in this case the Co already written, we just need to change the name of the table okay here and.
Dylan Wan: And all these are generic and we can actually.
Dylan Wan: join the.
Dylan Wan: stock written with seeing important.
Dylan Wan: OK so i'm going to go back to the presentation.
Dylan Wan: Okay next i'm going to.
Dylan Wan: walk through about how we can use important to do the.
Dylan Wan: duplication handling missing data and.
Dylan Wan: This is the Co example.
Dylan Wan: So.
Dylan Wan: How can we identify a duplicate duplicate can be identified in encoder notebook by doing dwindled down, so the distinct locum and the local of the table, if they are difference basically means that they have to be okay.
Dylan Wan: And then, what we can do is that a spa provider API allows us to use the distinct to job the.
Dylan Wan: Job a duplicate Okay, in addition to use that this think there's also an API allows us to to to to perform the the duplication, by identifying the unique key so in this case customer co ended reporting day is the key Okay, and we see this unique.
Dylan Wan: combination, we can chop the duplicate by just adding one line of a cop out with us Okay, and you can see that duplication and drop.
Dylan Wan: This is the low cost before drug is.
Dylan Wan: The next about missing data submission data is also something we can handle in the.
Dylan Wan: inquiry encoder notebook and and the materialist view, so this is another useful function.
Dylan Wan: With the bar.
Dylan Wan: So.
Dylan Wan: I calling the calling that point the function will look true or the other column and there's to show which Connor have done as the nobody.
Dylan Wan: Can nobody means that they are that duplicate so they they're missing data, and then we can feel a few of the the missing data by using the spark function identify which common.
Dylan Wan: need to be having a non and what's the value should be used to feeling and now, nobody and a disincentive for numerical data.
Dylan Wan: All liar okay all that people have.
Dylan Wan: Their different definition of outlier but the one that definition is that showing here, then I won't go through this, because this is more like a statistic tab.
Dylan Wan: topic okay.
Dylan Wan: Now I want to show you we can accomplish this in quarter, by doing something like this okay so earlier, I mentioned that I create we create a summary table in quarter by using the statistic data generated from.
Dylan Wan: From the spot API when we store the data in in the in in quarter we actually can use that to to filter outlier so we joined the price table.
Dylan Wan: Price sales price table with the summary table and a summary table having those metrics like percentile in the in this particular definition of outlier, which is that.
Dylan Wan: The sales price we should be young the the percentile range 1.5 times, and that means means outlier and we can very easily to filter out that the outlier data by using a filter like this okay.
Dylan Wan: The now, I think I have another in 15 minutes and we're going to go to the.
Dylan Wan: machine learning model department with important.
Dylan Wan: Okay i'm going to go to a problem which I got from the customer and what they were wanting to do is that they use they want to use a deterrent, they want to find out a rule of assigning sales REP.
Dylan Wan: In the beans application and they want to verify if the process is correct Okay, by getting the data getting the rule out from the data right, and they can they can better they go back to the way they did their existing in this rule okay.
Dylan Wan: And then, when we when we come to the the topic of doing them model creation variation they have many choices of the one of the framework we can use okay.
Dylan Wan: encoder support in Puerto Rico has included support a spa and that the byproduct is that we get the support of the spot permission I spot a male.
Dylan Wan: Okay, so you can definitely run the the using the spot to to produce the machinery model, but the Info that also support price bar which Python environment is also available so some some people will use the.
Dylan Wan: The framework like zeki learn tensorflow or or paid paid harsh, which are the popular.
Dylan Wan: Python based on a framework for machine learning Okay, when we.
Dylan Wan: Think about this when you think about what's the machines.
Dylan Wan: Capacity okay if we run it on the spot.
Dylan Wan: Cluster that they the machine learning pipeline will be on this podcast, but if we are using Python based framework, we need to ensure that the.
Dylan Wan: The the machine running Python having the enough capacity and here i'm talking about either will be the loader service or analytic services, because when we doing the training ronnie and services when we run the refresh prop job were running from the other services.
Dylan Wan: And can we do to Panda right, this is something need to be careful, because once we do to pan out that data copied into Pilar, so we need to ensure that I don't have.
Dylan Wan: Enough capacity.
Dylan Wan: Okay, now I am going to walk through the.
Dylan Wan: importer ml.
Dylan Wan: ml is a is not a, it is a.
Dylan Wan: An API we create a to address on the machine learning.
Dylan Wan: model building problem Okay, we support those who via all these are coming from.
Dylan Wan: Coming from on spock to encode that me library to do that on the spot library, what we are doing is that we're trying to simplify the whole thing OK, so the left hand side is the i'm going to go over and the in the right hand side we're going to get into into more detail.
Dylan Wan: To open the.
Dylan Wan: email.
Dylan Wan: So what are we doing here is that always first though the data into into encoder.
Dylan Wan: Was loading the data into in kota and we.
Dylan Wan: prepare the data in this case, we were changing that they have high.
Dylan Wan: And then we agree that data.
Dylan Wan: which are already loaded in in in quarter okay and show them show you that.
Dylan Wan: This is what the flow and now i'm going to go back okay.
Dylan Wan: What do we do is that we already are entering the data you've seen the duplication and then missing data handling and.
Dylan Wan: Remove the outlier Okay, so we have a clean data as a starting point, and then we split the clean data into training data and testing data also be available as far as mature as view.
Dylan Wan: And the wheezing the training model, what do we do it that he showed small box or the encoder.
Dylan Wan: me library call.
Dylan Wan: and
Dylan Wan: The prepare prepare feature, basically, is when the data coming in us, for example, right like us stream.
Dylan Wan: Some of the machine learning ubm require that they had to be numeric So what do we do is that we go through the encoding process and sometimes we may go through the standardization process as a weather happen in the future preparation fits.
Dylan Wan: What in quantum me library do is that it actually.
Dylan Wan: prepare the prepare the the model, using a training data and save the model behind the scene, so you don't need to save the model by yourself and.
Dylan Wan: And when you are in the testing and.
Dylan Wan: In the influence of faces by.
Dylan Wan: This the the the feed model and chain model will be used okay that's what I wanted.
Dylan Wan: To show you okay So here we see this prepare prepare feature I so we provide a training data into prepare feature and then passing us changing, and this will produce the the index model and next time we will use the model already created in the.
Dylan Wan: In the.
Dylan Wan: In the testing they have say Okay, and this is a building model needs to follow the diagram I showed earlier.
Dylan Wan: In the oldest encounter.
Dylan Wan: encounter in mill.
Dylan Wan: library and then evaluate Okay, let me quickly show you the comparison.
Dylan Wan: If we see this this view we have 30 lines of code to do the do everything, and then we actually write write a code in in spine.
Dylan Wan: You know inspire ml as well, so this is the same logic in spark ml if you look at the polio see OK, we have like 100 100 morning 109 okay you just say can be important to to demonstrate how.
Dylan Wan: encoder ml can help.
Dylan Wan: Okay.
Dylan Wan: Next i'm going to.
Dylan Wan: touch a topic that.
Dylan Wan: In addition to using the encoder notebook environment for model.
Dylan Wan: model creation, we can also develop ml model in.
Dylan Wan: In a Jupiter notebook which is it can be done outside are all saying okay.
Dylan Wan: And here here's the.
Dylan Wan: We have we have you mind Okay, so you know the data from in kota okay using your own.
Dylan Wan: notebook.
Dylan Wan: interchange and the very the model inside of your own notebook environment again even managing the notebook to the.
Dylan Wan: Virginie and, at the point is that I can, after the model is saved into the model directory, and it can be used in.
Dylan Wan: encoder in in in in MV Okay, here we can use in core ml or we can use.
Dylan Wan: spark ml and then on the model can be reused, and this will be integrated into the regular encoder pipeline, so that incremental refresh will have the latest data and we're not going to train a model again unless you wanted to continuous training, otherwise you will get the data and.
Dylan Wan: perform the influence or prediction and get a prediction repo and prediction, I will be available, just like any other table, you can use, including our dashboard okay.
Dylan Wan: here's the here's the API which will provide and I provide a link to the encoder documentation.
Dylan Wan: I just go quickly go over we have read the API here will allow you to accessing the data flowing in kota The first one is very similar to the one we.
Dylan Wan: currently available in the incarnate notebook mentioned the schema table a schema name and table name get the data in the spot, they have been okay.
Dylan Wan: But these are additional Okay, we can directly reading data independent they are offering, we can also using the sequel statement to prepare the data frame, we can eat even write write the sequel statement against the beans view and and to to start with the ml related job in a notebook.
Dylan Wan: And they are also the API which allows us to dry the data to the disk and which can be consumed by your quota.
Dylan Wan: So you is the.
Dylan Wan: Environment and going to show you that you can then pull okay so first we will read the data.
Dylan Wan: We your need to deploy this in kona data API.
Dylan Wan: into your notebook environment and that this is the origin authentication face similar to blogging to ensure that we provide the tenant user and in the login to access in kota spark cluster week here, instead of using the password based.
Dylan Wan: authentication, we need to generate a key super user and and a super user can granted permission to other user we have documentation about how to do this Okay, and this is how to how your quote unquote API so by calling the encoder info API you can the.
Dylan Wan: session getting a spa session spa contacts and that can be used for getting a data Okay, and this is an example how to call the.
Dylan Wan: grid API by using the schema and the schema table together data, the rest of the spot that have written up is similar to what we can do in.
Dylan Wan: In kota important component notebook right printer schema little calm and also, we can convert to depend on okay and do the further Panda process which are not going to show.
Dylan Wan: And to see the detail of info API this rainbow is the same thing, whether as I described in the presentation, you can also see that front end code documentation.
Dylan Wan: Here is another example where she came agreed the data from the beans view Okay, this is sales gradients this credits that interview.
Dylan Wan: And that we prepare the data from the things you can get the data okay so actually was mentioned, this is actually one of the one of the value which we can we can.
Dylan Wan: include that can provide to the preparation, you can prepare the data in the independence view and just by doing this right all the detail are are hidden in all the join already already handle that they have model is is handled as part of the schema.
Dylan Wan: development, this is where actually data engineer year meet with they have scientists Okay, many people are.
Dylan Wan: Already workout in kota may pray the low as like a schema developer data engineer for preparing the data for regular reporting purpose right.
Dylan Wan: Actually, the knowledge and skill can be changed for to support the machine learning pipeline as well, because the machine learning actually required the good data and require that the the data which you may already in kota and leveraging your existing investment, we can get the.
Dylan Wan: That they have pipeline, not just supporting the reporting, but also support machine learning.
Dylan Wan: Okay, either way, this is just a regular regular API and then we can call it a safe lucky, and this data will be the thing about this out the prediction for right, then you can save the pretty data to the OK.
Dylan Wan: OK, this is the underlying architecture, how we do, how we support.
Dylan Wan: A book, this is this diagram just regular Jupiter notebook architecture, we accessing the data from browser and the Jupiter notebook is server is running on a server Okay, but what encoder do this, is that we actually integrate with the encoder.
Dylan Wan: spark roster so the.
Dylan Wan: It in a in a point when I establish.
Dylan Wan: that connection we can spark contacts right and this this by contacts, basically, is the Jupiter server connect to spark.
Dylan Wan: By running this spark driver in the in the notebook instance Okay, and then the rest are actually existing a spark.
Dylan Wan: And the encoder infrastructure, important though the service and the service connect inspire this is that are your environment by by installing Jupiter notebook you can get the access.
Dylan Wan: To show up quickly show that that a.
Dylan Wan: Little co we have.
Dylan Wan: A shift in the Info community.
Dylan Wan: about how to save model and the low low go towards model understanding this is important because this enables us to separating separating the environment, so it can produce a model in one environment and the low, the motto in another OK.
Dylan Wan: OK here is the the the Community, you can go to a detail, you can actually download that data and also download the schema deploy to the input background environment for or experiment.
Dylan Wan: Okay now let's see okay let's do have magnify.
Dylan Wan: going to go over the third party.
Dylan Wan: Integration Okay, so our view is that in kota is not just.
Dylan Wan: not necessary to do everything within quota, so we know that, and then there are many third party ml platform as a rising Okay, so what we're thinking is that a.
Dylan Wan: front end quote on perspective, how what how encoder can help one of the ways you think that in kota as the data source for machine learning.
Dylan Wan: So the devil inside in kota so encoder can still do the data injection they have preparation in act one scene and and that you've seen the the integrated integrated.
Dylan Wan: integration with spa using the environment to to prepare your data Okay, and that, once the data is available, it can be available in any machine learning platform.
Dylan Wan: And the machine learning platform acid that data in various ways, one way is using the sequel interface Okay, the data and if the.
Dylan Wan: machine learning platform to share that data, you can consume that data in the in any format which going going through the.
Dylan Wan: can be supported by our data lake.
Dylan Wan: connector right, such as okay well.
Dylan Wan: As you just described, was the the.
Dylan Wan: Integration, we have in Beijing.
Dylan Wan: Okay, and I have some.
Dylan Wan: I have a.
Dylan Wan: YouTube channel, if you are interested in to learn more you can subscribe to our YouTube channel okay we're going to now go into the QA.
Joe Miller: Thanks dylan um I did get one question that came in the the chat.
Joe Miller: Is there the capability to create an endpoint on the model so transactions can be sent to that model and retrieve a related prediction.
Dylan Wan: In kota is not.
Dylan Wan: In quantity not like providing this type of service.
Dylan Wan: Directly saying Okay, we will expose the.
Dylan Wan: employee and the other hand, we can consider in kota as a consumer, so if any ml platform which can provide such important we can actually.
Dylan Wan: Call it endpoint.
Dylan Wan: From our our regular.
Dylan Wan: Data integration to the materialist view in a batch process, we can call.
Dylan Wan: In and point to concern the prediction.
Dylan Wan: Okay, hopefully this instead of question okay.
Dylan Wan: And you have.
Joe Miller: The only other question that came in was kind of a question around the the recording of this session this session has been recorded and will be sent out following this session, as well as posted back to the Community, and you can find a Community again it community in quarter.com.
Joe Miller: With that, unless there are any other questions that come into the chat I do see that the administrator has added the Community link into the chat there so anyone can take advantage of that.
Joe Miller: But we want to thank everybody for joining today and dylan, thank you for taking the time to share your expertise and knowledge within the platform and especially around data science, so thank you everybody have a happy Wednesday and we'll see you at the next action and insights.
Hosted by:
Dylan Wan
Senior Solutions Architect
Joe Miller
Senior Director, Community and Customer Enablement