![]() ![]() Most open-source products listed below are in fact open-core, i.e. Data discovery: finding the right data asset for the problem.Quality Assurance: bad data = bad decisions.Transformation: preparing data for end users.Data Warehousing: storing, processing, and serving data.Integration: adding data from various sources.Collection: ingesting & processing events.Instrumentation: registering events in your code.Specification: defining what to track (OSS in that area hasn’t evolved yet).Before data can be used for that, it needs to go through a sophisticated multi-step process, which typically follows these steps: Organizations need data to make better decisions, either human (analysis) or machine (algorithms and machine learning). Similar to my original blog, this post follows the steps in the data value chain. And while I firmly believe that open source is not by itself a compelling enough reason to choose a technology for the data stack, it can be the most feasible solution in some situations, for example, in a highly regulated industry or due to local privacy and data security legislation that may prohibit the use of foreign SaaS vendors. In my earlier post, I proposed a data stack for a typical analytical use case along with the key criteria to choose tech for each step in the data pipeline, such as minimal operational overhead, scalability, and pricing. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |