Incorta Direct Data Platform

The unified data analytics platform that provides everyone with the means to acquire, enrich, analyze, and act on business data with unmatched speed, simplicity, and insight.

GO

Incorta Intelligent Ingest

The fastest way to transform, connect, and prepare data from multiple data sources for complex analytics.

Go

resource-icon-newResource Center

Stop here for guides, blueprints, ebooks, and other resources that illustrate modern approaches for accessing, analyzing, and acting on data across roles and industries.

Go

learn-iLearn

Get all the facts on modern analytics in self-paced learning paths led by our experts. Enjoy courses designed for administrators, developers, and analysts.

Go

DocumentationDocumentation

Dive into Incorta with official documentation, how-to’s, tech specs, user tips, and more. Get the answers needed to optimize your daily user experience here.

Go

CommunityCommunity

Join others and discuss the platform, register for webinars, explore events, learn about new product releases, and get support from the Incorta community.

Go

support-iconSupport

Need help navigating Incorta? Our experts are ready to help. Our team is here to answer questions, troubleshoot, and provide solutions to optimize user experiences.

Go

How to know your customer using data

With more customer data available than ever before, teams can finally identify critical trends quickly and make smart pivots for faster growth. Learn how to apply these strategies during COVID in this blog.

Read Blog

Screen Shot 2020-09-03 at 10.32.35 AM

Leveraging Incorta Notebooks for Machine Learning

October Action on Insights

Did you know Incorta has an add-on that creates an interactive environment where you can explore, manipulate, transform, analyze and create predictions?

Watch this webinar to learn:

  • What are Incorta Notebooks?
  • How to leverage predictive analytics in operational reporting
  • See a short demonstration of scripting in Notebooks

Webinar Transcript:

Joe Miller: hi everybody, thank you for joining this session we’re going to give it about 60 more seconds to give everybody, the opportunity to join hang tight.
Joe Miller: Okay, I see more people joining so we’re going to go ahead and launch into it and i’ll just kick us off with a little bit of introductions off the start so.
Joe Miller: First of all, I want to introduce myself, my name is Joe Miller i’m the senior director of Community and customer education period and quarter and i’ll be your host today.
Joe Miller: I want to welcome you to today’s session october’s action on insights this webinar is one of our.
Joe Miller: Many webinars that we run on a monthly webinar series where we share best practices and tips and tricks to help our customers, unlike the full potential of the quarter direct data platform.
Joe Miller: Today we’re going to learn a little bit about leveraging notebooks and in quarter for machine learning.
Joe Miller: Just a little housekeeping before we begin, though, I just want to make sure that everyone knows that by default.
Joe Miller: Upon entry of this room, you will be muted, if you have questions or comments throughout the duration of this presentation go ahead and use the chat or Q amp a functionality.
Joe Miller: And at the end of session we’ll circle back and try to answer some of those questions, so I would like to introduce today’s speaker.
Joe Miller: And then emit is the director of solution engineering here at encarta he’s a part of the application architecture team responsible for developing.
Joe Miller: The analytic application blueprints in the end quarter platform I mean, I want to thank you for taking the time to show your expertise and knowledge on the Court on notebooks today i’m going to go ahead and hand it over to you, so you can kick us off.
Amit Kothari: At sure thanks Joe.
Amit Kothari: Let me go to the agenda, so hi hi everyone thanks for joining the webinar so today’s in today’s webinar we’re going to talk about our you know the.
Amit Kothari: let’s get the notebook my notebook offerings.
Amit Kothari: You know, which is part of our platform and then.
Amit Kothari: it’s cool i’ll show you how we can use notebooks to do predictive analytics and ml.
Amit Kothari: machine learning on.
Amit Kothari: Using that and then what we’ll do is we’ll give you a small DEMO and then small DEMO on how we can use notebooks for doing all that, and then there will be a Q amp a session after that, so if you have any questions you can ask me.
Amit Kothari: let’s quickly go to our next slide so basically, I just wanted to give you an overview of our in court unified data platform so as as you as if you are a user of in kota or if you’re new to it, you.
Amit Kothari: Know yeah I wanted to highlight that this we are our end to end data platform, so we we have lot of the fish the fish look at the slide from the left hand side.
Amit Kothari: These are all the sources right the application databases big data, so we are hundreds of connectors to this platform can ingest connect to that and ingest data.
Amit Kothari: into our platform and once the data comes into our platform our direct data mapping, we can just join all this and all of this data gets loaded into.
Amit Kothari: memory and then you can do fast analytics On top of that, so.
Amit Kothari: The the big usp of our platform is that no there’s no data transformations so you know you don’t have to do atl and all that, and you can do fast reporting on that.
Amit Kothari: Now, so now The other thing which we can do is, we can now once the data is ingested right we store all this data into parquet formats.
Amit Kothari: Now we can start doing using a notebook on top of it and then using using the data already invested begin now use notebook to do further processing of the data so basically we can do machine learning, you know.
Amit Kothari: And and and the language so we support multiple languages on that i’ll talk more about that, but we’re at a high level.
Amit Kothari: You know, you can further process that data using your various mls algorithms right, and then we can and then, similarly to any in quarter table we can save it.
Amit Kothari: And we can load it and then that data that predictive prediction data can be now join to other data sets and also surfaced in in you know, in the dashboards right.
Amit Kothari: So let’s quickly go to some brief overview of what is a notebook so if you’re not familiar with a notebook or notebook is basically an interactive interface, so you might have heard of Jupiter notebooks or Jacqueline.
Amit Kothari: A notebook, so this is kind of like a very nice interface for interacting with data and a lot of data scientist and developers use this to kind of write small chunks of code and then you can you know you can easily debug the code and you know.
Amit Kothari: You know.
Amit Kothari: visualize the data, and so it don’t force lot of lot of benefits as a as a developer.
Amit Kothari: So basically it’s an interactive interface, and then we support this for languages, right now, so typically our predominantly our in quarter users uses Python and sequel, but we also support our and scholar on that.
Amit Kothari: Right, so now and then we can and then now the Nice part is, we can import any Python models right, so we can you know we can import the common models, like our number by a band, as all that, and then we can use the functions of those modules to work on.
Amit Kothari: To work on the data right.
Amit Kothari: So now what we can do is, we can.
Amit Kothari: So basically we can write small small small paragraphs of code, so that we can explore the results you know, basically, if we have any.
Amit Kothari: You know, some ideas on how to test our test some hypothesis, or how to our our or easily debug the code right so that’s one of the biggest advantages of the so you can write print statements anything and it’s like so you all of that can be done within the context of a notebook.
Amit Kothari: Okay, so i’ll show you where when I when I show the demos more clear on your right hand side you see the small screenshot of of basically the notebook so you know to your your activism visualize the data I can I can kind of.
Amit Kothari: Do graphs on that, and you know, or we can use a lot of plotting plotting rabbit.
Amit Kothari: You know the plotting libraries of bison to basically control charts and all that Okay, and then the big the big advantages.
Amit Kothari: If you if you are in a notebook right you don’t have to load any table anything you can just work on some sample data and even immediately view the results of of your logic.
Amit Kothari: Right so that’s that’s the biggest advantage and right, so you can so you can start with a small data set you can you can visualize the data, you can validate the data and then, when you see okay all of my my model is good, and I can save it.
Amit Kothari: And then, and then I can load into.
Amit Kothari: Basically, they don’t work on the full data set.
Amit Kothari: Okay.
Amit Kothari: let’s go to the next one, so this kind of gives them some some some kind of a example of.
Amit Kothari: You know.
Amit Kothari: Have a notebook if you look at the left hand side so basically you know, so we are importing importing our pipeline model and then.
Amit Kothari: And then, basically, I am basically drawing drawing a plot Okay, so all of this can be done in our our notebooks I can again right, I can do religions, I can do title, you know, this is a simple example of the power of a notebook.
Amit Kothari: So let’s now let’s talk about why we should buy buy us in quarter for doing ml so one of the big Edwin one of the big problems which which data centers currently face.
Amit Kothari: Is the data preparation part of it, so, if you look at the literature and all that you will see that most of your predominantly most of the time, more than 70 to 80% time goes in data prep so once the data is in the form.
Amit Kothari: Right so it’s flattened data set on which you can do ml that takes up a lot of your time so basically basically the the time the attempt to deliver the ml model.
Amit Kothari: You know, takes a long time because because there’s this you know, the biggest problem is of data prep is basically data acquisition then.
Amit Kothari: Basically, the and then and the quality of the data, the security all of all of that, so your time is spent in that so.
Amit Kothari: If you use if now that’s where the biggest strength of in court as a unified data platform comes in right, because if you look at it, we already have the base tables the.
Amit Kothari: So basically, I suppose, if you have SAP or a business it data right, so we already have the data of the base tables already loaded.
Amit Kothari: So we can right so so we can load the base tables, we can join them all of that already has been done as part of your.
Amit Kothari: Operational reporting right so so now, I can use that already prepped data, so, if you look at this slide from the the top left right, so the top left is.
Amit Kothari: All of his basically is basically my data model right, so I have all these tables which i’ve done together now I can now, and now I want to do some predictive analytics on the data suppose I want to so for some use cases can be, I want to.
Amit Kothari: I want to do work for for my gas collection.
Amit Kothari: Account receivables account receivables area, I want to optimize my cash flow, so I want to predict, I want to predict.
Amit Kothari: Basically, how how late Philip you know.
Amit Kothari: The payments will be so to do that, I already have my test collection data in the system, so now, what I can do is I can start.
Amit Kothari: A notebook right, I can I can read the data and then I can immediately apply start applying basically the way it is you know the ml algorithms on that.
Amit Kothari: So basically this data consolidation data prep is is is the power of important right, and now we have put in the way of you were given this notebook interface to basically start doing that DEMO on that.
Amit Kothari: So let me go to the next slide so.
Amit Kothari: So basically the so this slides highlight some of the advantages of using our platform so basically since our platform, you know our users already doing the operational reporting right now I can start doing predictive analytics in context of that so now, I can.
Amit Kothari: Now, what I can do is I can write a writer.
Amit Kothari: A notebook interface, and then I can do some ml and then I can join that prediction results to my operational data set and then within the same dashboard I can now see the data my my actual data and my predictive data.
Amit Kothari: Right so basically it is in its it’s kind of like a contextual ml between within this broader broader area.
Amit Kothari: You know our analytic analytic use cases, so, if you look at the various energy use cases right like cash cash flows.
Amit Kothari: Then.
Amit Kothari: Basically procure to pay all of all those are standard business processes for which we already have blueprints and now I can add this additional Vedic prediction, on top of that right.
Amit Kothari: Now the other thing about ml is this offline batch processing so basically What it means is so there are two parts of ml right.
Amit Kothari: One is you’re creating the ml model based on historical data, so what that model is created Now I want to use that model to score new data right, and then I want to also on a on a on a different different schedule.
Amit Kothari: disclose this on a on a on basically a different schedule, I want to now also update by.
Amit Kothari: Basically, the model so basically if you think of it there’ll be there’ll be kind of two programs right one is.
Amit Kothari: One is my modeling program it can be run every week or something and one is my real time scoring scoring program which will do the predictions on based on that model on the new data set right so basically now using encoder I can now schedule.
Amit Kothari: You know there’s various model.
Amit Kothari: The model building and all that right so.
Amit Kothari: And the other big advantages basically this data, scientists and data lake all of them uses the same platform so it’s not that.
Amit Kothari: We have a separate ml platform where now, I have to get data out of in quarter and put it into some other platform right, so this is all happening within the same platform so.
Amit Kothari: Basically, we will reduce data data or duplication and storage and all those in APP server needs right and now.
Amit Kothari: other thing which we can do is we, since our platform is platform can also leverage the basically the rest api’s right so suppose, if you have any external external publicly accessible APS within my notebook I can also I can also connect to that right, so we can do, for example, all this.
Amit Kothari: In other words, worsening and basically the you know the life cycle of the MN process right so it’s kind of like operation So for me and operations, we can actually we can actually.
Amit Kothari: Use the APIs of some other tool right and then and then basically give you can use that to you know to basically fine tune in you know the monitor our.
Amit Kothari: Basically, the machine learning process okay.
Amit Kothari: Let me go on to the next slide.
Amit Kothari: other thing which we have also done as part of our.
Amit Kothari: notebook offering and to make ml easier is vm vm created our or Python module called in kota ml, so it is basically a Python wrapper.
Amit Kothari: On on various algorithms, for you know which is which are kind of like standard algorithm which are very familiar with all of all of the data, scientists and all that so basically it makes it.
Amit Kothari: So the advantage of using input, I mean is, you have to write less code so, for example, if you read if you if you look at the left hand side compared to the right hand side the if you don’t if you use in kota ml then, then you have to just write.
Amit Kothari: You know, two or three lines of code.
Amit Kothari: Two or three lines of code and so you don’t have to so you’re basically so the data scientist, or whoever is doing the ml modeling they can focus on.
Amit Kothari: Basically, the model building right so rather than rather than adding so much code of it so it’s basically.
Amit Kothari: kind of abstraction wrapper on top of on top of the various.
Amit Kothari: You know the various yeah.
Amit Kothari: it’s good algorithms so so it makes it makes.
Amit Kothari: life very easy for you know our.
Amit Kothari: Data scientist so that’s that’s one big thing and then, if you look at is our amen, so we are constantly adding to our image library.
Amit Kothari: We also have so if you look at it, our image library offers the following, so it has feature selection feature prayer model building model evaluation we can save the model we can basically.
Amit Kothari: Read the model right, and then we can score, the data and then this is some of the algorithms the you know, there are Emily algorithm we support.
Amit Kothari: Using the Infotainment all these regulations classifier and all of that.
Amit Kothari: So basically, this is a optional thing right if you guys if if if users, want to do it, they can use in quarter ml, but they are they’re free to basically use any Python module.
Amit Kothari: And, and if, and the other advantages if they like, like some of our customers have done that, so they already had Jupiter notebooks where they were doing amen so they could just quickly.
Amit Kothari: quickly copy their existing codes into our platform and so basically they they didn’t have to rewrite their email, you know the know so that’s that’s another big advantage because we use standard standard Python and standard notebook interface.
Amit Kothari: it’s good to quickly show you the DEMO.
Amit Kothari: Let me login quota.
Amit Kothari: Okay.
Amit Kothari: So what i’ll do is as part of my DEMO I will actually show you two to exert to use cases one is on supervised learning and one is on unsupervised learning Okay, so let me show you.
Amit Kothari: The unsupervised learning DEMO on so basically what we’re going to this is this is basically a K K means clustering DEMO so where where my main main thing is that I have this data set right and I now, I want to do some clustering On top of that, so that I can.
Amit Kothari: figure out.
Amit Kothari: figure out the various you know our data clusters within the system so that I can.
Amit Kothari: I can, I can do, I can do some predictions on that are you know so basically the use cases so, for example, right So if I if I.
Amit Kothari: OK, so what I said, the first thing, what do you ever do is so Basically, this is a file, where I have.
Amit Kothari: Important the file and loaded the file, so there are like three so the, so this is my source data says 381 grows right So if I look at this this data let’s let’s look at this data if I go to explore data.
Amit Kothari: So basically, this is a loan prediction DEMO okay so basically, I have this data file, whereas the loan.
Amit Kothari: Basically, the details of the various loan right, so this is, this is a loan ID gender applicant income co applicant right and and then the loan amount and all that now, I want to.
Amit Kothari: kind of do some clustering based on income and loan to figure out who who home home to give how many how much loan right so kind of.
Amit Kothari: seeing their credit, history and all that so so so So what we can do is, we can use the K means clustering algorithm to do this so let’s so what i’ll do is.
Amit Kothari: So this, so the way you create a notebook is is basically a materialist view so you say new.
Amit Kothari: delectable materialist view right so that’s the first step so let’s click on this and Edit let’s go to edit in notebook.
Amit Kothari: So the first step is basically what I have to do is basically I read.
Amit Kothari: I basically have to read my data set okay so, so this is, this is a very, very simple example of reading data, because I just have less than one fine.
Amit Kothari: But but typically, as I said right if you have like 50 tables and you so you can also write a query so you can write a sequel query here, where I where I have actually joined all those tables and then.
Amit Kothari: And then I can then I can.
Amit Kothari: use that data to do the math right So the first thing which I which i’m doing is i’m basically inputting inputting the various Python module.
Amit Kothari: So, so what what we can do is using pip install we can install all this Python modules in our in quarter server right so i’d already installed all this and as non pie and mclachlin right.
Amit Kothari: So I import all this and then okay so then now this this is that read step, so this is a special function, so we have two special functions, one is a read and one is a sale so read is basically reading our party files right so Basically, this is like a schema dot table.
Amit Kothari: So it can be any cable, which you know which you’re already ingested into encoder so so so I can have, and this def basically just read of this data.
Amit Kothari: creates a data frame okay so it’s if this is basically a spark data frame okay now, this is the, this is the spark data frame and so that’s, the most important thing that.
Amit Kothari: When we read data it creates a spark data frame and when we have to save data, we also have to save into a spark data frame.
Amit Kothari: why this is important, is because, because when we when we use pandas pandas has his own data frame also so to use pandas, we have to convert it and convert it to a pandas data frame so that’s what i’m doing i’m seeing data equal to dfw pandas okay now this data becomes.
Amit Kothari: A pandas data frame and now I can use the various.
Amit Kothari: Various libraries available with.
Amit Kothari: You know, various functions available with append as right, so I can.
Amit Kothari: I can examine the data, so this is like five rows, and these are the radius columns so I can say okay show me some did I want to start exploring i’d want to start So the first to the.
Amit Kothari: Data scientists does they do work right is first they want to visualize the data they want to see the main max’s how the data is spread.
Amit Kothari: Then they can start thinking about what algorithms what features, to create for doing this prediction right so that feature engineering part.
Amit Kothari: will come later Okay, so now first I okay so some sample data right, so I have this, so this is this is one paragraph right, so I can now just run this Okay, I can just run this.
Amit Kothari: Okay, so Okay, so this Casey, as I told you, you just even run small small chunks of data right.
Amit Kothari: While this is running, we can go to the next paragraph OK, the next paragraph is now, I want to visualize the data Okay, so I took two variables one is the loan amount and the applicant income.
Amit Kothari: Okay yeah so see the first paragraph ran right and basically it important, all these libraries if read the data and then it displayed those the five rows Now I want to visualize the data okay so so basically, I am taking.
Amit Kothari: Two metrics one is loan amount and applicant income right and i’m just using a scatter plot plot this data now out of this plot what I what my goal is, I want to.
Amit Kothari: I want to create clusters in this basically clusters or data set which are kind of similar so basically the whole point of clustering is I, I want to group.
Amit Kothari: This data into into its own similar data similar clusters right so so the way you do, that is, you use the caymans algorithm Okay, so I have plotted this okay.
Amit Kothari: Now now basically to now to find the optimal number of cluster right, because how many clusters, you want to create, you can what you can do is you, basically, there is a elbow.
Amit Kothari: elbow method, so what I do is I just use K K means right and this this this one is plotting plotting plotting the number of clusters so basically what the optimal number of cluster is is when it kind of stabilizes so you can say three or four because then.
Amit Kothari: You have the biggest so this one.
Amit Kothari: If you see this right it’s like exponential graph and it basically stabilizes at three and four so just doing three or four cluster.
Amit Kothari: should be enough, we don’t need like 10 or 15 clusters okay so once we have found out that Okay, we are let’s start with three.
Amit Kothari: So now, I want to categorize the data using the optimal number of clusters, so what I did was I now I use the same gaming’s and put this cluster equal to three right and.
Amit Kothari: Now, the first step is, I want to find a link for three clusters, I will do find the central point of this cluster around which all those data points are.
Amit Kothari: coalescing right, so I mark them as red So these are the three cluster right now, so now my so now, I found the Center of those three clusters right and now what i’ll do is okay so see.
Amit Kothari: So so basically now I can do all this, I can I can I can find out how many how many so 43 points 43 of those data points are in first cluster 126 911 add to right now, I want to visualize this clusters, so I can now start plotting them and coloring them so i’m saying.
Amit Kothari: Give blue color to cluster one green to this, so this shows you those three clusters see this is how the point so all of this, all of this data you’ve seen blue they kind of have similar properties of loan loan amount and income.
Amit Kothari: Right, so if you see see this as income increases your so this way based on certain.
Amit Kothari: In certain rules, you can now figure out okay for for some income for for your income, how much loan amount can be given right, because this is how.
Amit Kothari: This is how those busy the clusters clusters happening, and this is actually kind of used by the financial industry in a more sophisticated models, but at a high level, this is, this is what is done you basically.
Amit Kothari: Cluster the various applicants by their credit score and all that, and then you figure out the Min and Max Min and Max of the ranges of which they are eligible right.
Amit Kothari: So now, I can be So these are three clusters, so now, I can put four and five and see, but when I when I had seen that basically.
Amit Kothari: Is where the visual data said they were these were the three main clusters right so now i’ve done all this OK, now the now the last step is I basically.
Amit Kothari: I received this data frame Okay, so I had to do this because it’s a pandas data frame I had to convert it to a spark let’s try to do, but if it’s all but, but you don’t have to spend as you can this work on spark spark data frame also because.
Amit Kothari: spark also have I mean.
Amit Kothari: The spark also comes with.
Amit Kothari: Basically spark ml so the so you can just directly you spark ml on top of spark data clean so, then you don’t have to do this step so that’s it so one day save it right.
Amit Kothari: Now I can just validate it, I will lead now so once I validated, it does it does the.
Amit Kothari: Data discovery and then the three columns in that data frame loan amount of and the cluster so it gives you the cluster is the one which I was predicting okay.
Amit Kothari: So this is my columns and then I can just load this okay i’m saying loaded and then I can now visualize it so now, I can say explore data, so I already ran it.
Amit Kothari: And now I can.
Amit Kothari: Do this and then this will tell me so right, and now I can so I can do now various again now surface this in in quarter dashboard do visuals or no yeah you can do standard reporting on top of this.
Amit Kothari: So this is this was one example of.
Amit Kothari: Of K means clustering let’s let’s not.
Amit Kothari: Show you a complicated complex example of a real world use case you seen for classification, so this is my ar account receivable ml DEMO.
Amit Kothari: let’s look at this so.
Amit Kothari: For example, right, let me first show you the data prep part of it right, so how how I got the data for that ml So if I go here, this is my details extract Okay, so this is my EBS blueprint.
Amit Kothari: And if you look at it, this is my flattened data set for for for my mm.
Amit Kothari: hmm Now let me show you the complexity of this data model and which, which include a science here right So if I look at the query plan.
Amit Kothari: Okay, see there’s so many tables here so i’m already using this report as part of my operational reporting right where i’ve already.
Amit Kothari: done, you know done the Vegas joint and the modeling part of it, so now I just use this dashboard to create the initial data set.
Amit Kothari: Right, so this data so basically the whole point of this DEMO is that I want to so now, this is, you know these are basically the.
Amit Kothari: You know the customer invoices the invoices, which have sent to customer and i’m and i’m waiting for payment right, and if you look at it, if you look at it, the among.
Amit Kothari: One second Okay, and now what i’ve done is I have created formula columns in this dashboard where I put all this.
Amit Kothari: All this rolls into some aging bucket so basically This shows that this role is already 30 days late all of dissipating time Okay, so this so I have created an aging bucket five aging buckets.
Amit Kothari: One to 3031 to 60 right, and now I want to create a model on this, so that the whole point is whenever a new invoice comes a new invoices generated for customer.
Amit Kothari: I can use this model to score that invoice and figure out that this invoice will be say 90 days late, so I don’t have to wait for 90 days to contact customer.
Amit Kothari: Basically, my my model will tell me that we are predicting that this invoice will be 90 days late, so you can be more proactive proactive in collecting.
Amit Kothari: collecting them on from them right so that’s, this is a very important use case you know which the industry uses so basically Okay, so this is my data prep part of it.
Amit Kothari: Right, so now what i’ll do is I look at this Okay, so I have industry this data Now let me go to my air prediction notebook.
Amit Kothari: Okay, so basically Okay, so I imported all these functions now right now i’m reading, you know just like before I am basically reading reading this data set.
Amit Kothari: Right now, I am finding the count of that now, I want to visualize this data set also right so Basically, this is my This is my late buckets Okay, this is 60 days late, this is not late, and now I want to create a model basically a ml model out of my actual data.
Amit Kothari: So what you do is you first start doing the feature engineering So these are the various columns right so now i’m just kind of visualizing the data here.
Amit Kothari: Okay, this, this is, this is my sample data.
Amit Kothari: Go yeah.
Amit Kothari: Okay, this is the main part feature engineering so here I am feature engineering is just a fancy word.
Amit Kothari: For basically finding the attributes, which are good predictors Okay, secondly, so suppose I might have 50 columns or 50 some right but out of that maybe.
Amit Kothari: Only a subset of five or six of them are good predictors for this ml model, so this feature engineering is that OK so i’m doing the feature engineering where.
Amit Kothari: they’re basically the summary various features like find out the number of invoices that were paid late right so ultimately we want to.
Amit Kothari: do this by every customer that for this customer so that I can I can better predict is be in their behavior in now basically paying all the bills, so this is like you know the various features ratio paid invoices that will need some Okay, so I have like 3040 features here okay.
Amit Kothari: So now once those features are done okay now i’m just kind of floating floating some of this some of this buckets.
Amit Kothari: Okay.
Amit Kothari: And this knowledge standard standard things which we have to do to make it into.
Amit Kothari: So that make it into our data which can be predicted, so you to use this string index of these are kind of standard things.
Amit Kothari: OK OK, so now the now, I will start doing the creating the model so first what I do is i’ll train the model so i’ll do a 7030 split.
Amit Kothari: so out of my say I have hundred rose, I will take 70 70% of the data 70 rules and create the model.
Amit Kothari: And then on the rest 30% I want to apply the model and see how good the predict prediction is because I have the actual data so now in combat actual words that predicted and.
Amit Kothari: and see how would the prediction is if the prediction is that will tell me if my if my model is good, I need to tweak the model right.
Amit Kothari: So okay so i’m using this in court ml okay so ultimately.
Amit Kothari: Okay, and this, this is my all my basically the evaluated my errors and all that okay so i’m using this in court MLC in court ml and predicting it Okay, and then what I did was once once the data is saved I have now this converted is the one which has the predictions I saved it.
Amit Kothari: So now, if you look at it, there are three three.
Amit Kothari: Basically, three columns here one is the invoice number four which, for which I want to do the prediction the actual late bucket and the predictive lead bucket right, so we want this to be as close as possible, I want my prediction to be.
Amit Kothari: There you know Vedic prediction to.
Amit Kothari: Michigan I want a lot of rows predominantly more most of the rose to have cases where actual MPEG are the same, so that I know my model is good, now I can use this model to score new invoices, so this is my based on existing data my model right.
Amit Kothari: So I ran I ran all of this okay so and this created 169 K rose now I can go to my dashboard surface this data okay.
Amit Kothari: So this is my actual data excellent rather than this one predicted class Okay, so now to look at this see now, this is a noise number.
Amit Kothari: Okay, and, and this will tell me my actual aging and predicted aging so basically so okay.
Amit Kothari: And then, this this one is my.
Amit Kothari: is basically the data which I got from that prediction, so I can also surface that so it’s it says it took like eight seconds to read.
Amit Kothari: And there were 30 features and accuracy 79.67 so it’s like 80% accuracy, so we can think it’s a it’s a good model which can be used to you know go do this prediction for this for this use case, so this is a classification.
Amit Kothari: algorithm so basically part of the supervised learning.
Amit Kothari: Now, when I showed you earlier, was unsupervised learning.
Amit Kothari: Okay, so so as you can see, this the within quarter, you can very easily use notebook to do do your radius ml on your existing data, create new use cases on your email and then surface the same data.
Amit Kothari: In your dashboard other thing which we are also doing, which is going to come pretty soon as we have one click MN where.
Amit Kothari: We were you can directly to me directly on the inside, so suppose, this is basically democratizing ml everybody, so if you’re a business user and don’t.
Amit Kothari: Know don’t want to deal with Python and notebooks you can just do directly a one click ml a forecasting kind of thing directly on the inside.
Amit Kothari: And then we also went to support external notebooks like Jupiter notebooks on in quarter data set so I can say I don’t have to be in in court, I can be in an external.
Amit Kothari: Jupiter notebook outside of encoder and I can connect to encoder and do all my analysis directly on your external notebook so that’s also we’re going to release position.
Amit Kothari: So I think that’s all I had.
Amit Kothari: You should we do the Q amp a now or.
Joe Miller: The good news is that there’s there’s one question that came in so well let’s cover that really quick before we we sign off here what’s the advantage of using encoder machine learning over the spark machine learning library.
Amit Kothari: Oh so basically the big advantages is just the number of the number of steps you have to.
Amit Kothari: write for coding it so in kota ml is like a wrapper on top of it, so you can just call one function for prediction for K means right and you just pass a parameter and you’re done, but if you’re using spark ml you’ll have to write like 15 or 20 lines of code.
Amit Kothari: So it’s just just the complexity part of it.
Amit Kothari: So if you already have the spark ml code, you know you can you can you know you can definitely use it there’s there’s no problem in that.
Amit Kothari: But if you understand, new projects and you want to explore in court, I mean you can do it, and then, if you feel any, then you can compare and see if the prediction from in court ml is is better or worse than spark ml and then you can decide what to do.
Amit Kothari: Any other questions.
Joe Miller: That was it, I mean, could you pop over the five deck real quick and we’ll close out the session here um yeah it in the session chat for anyone who who’s.
Joe Miller: With us here, I did paste into more links of upcoming action on insights sessions, one of them in November, is going to be focused on dashboard design principles.
Joe Miller: And the one in December is going to be focused on row level security and column level security so.
Joe Miller: feel free to go ahead and join in there, in the meantime between the sessions, if you want to ask more questions around machine learning.
Joe Miller: And encoded notebooks please go to our Community community.com There we have our peer to peer discussions Q amp a forums, I mean I believe you just published an article there about clustering there recently.
Joe Miller: So you can self serve on some of this content in between the sessions with that I want to thank everybody for joining us here today if there’s any more questions go ahead and reach out to us at learn.
Joe Miller: At encoding calm and we’ll make sure that we get this field that I did have one question that I owe someone to come back with after the session, thank you and have a great day.
Amit Kothari: Thank you.