Lessons Learned from Building and Testing a Data Ingest Workflow at Scale
Feed: AWS Government, Education, & Nonprofits Blog. Author: publicsector. on 11 OCT 2017 | in Public Sector | Permalink | Share Dean Kleissas, Research Engineer working on the IARPA MICrONS...
View ArticlePractical Machine Learning with R and Python – Part 2
Feed: R-bloggers. Author: Tinniam V Ganesh. In this 2nd part of the series “Practical Machine Learning with R and Python – Part 2”, I continue where I left off in my first post Practical Machine...
View ArticleRethinking data marts in the cloud
Feed: All – O’Reilly Media. Author: Greg Rahn. Clouds (source: Pexels) Check out Greg Rahn’s session, “Rethinking data marts in the cloud: Common architectural patterns for analytics” at the Strata...
View ArticleMigrating to MySQL 8.0 for WordPress – episode 3: query optimization
Feed: Planet MySQL; Author: Frederic Descamps; Now that MySQL 8.0.3 RC1 is installed and that we saw how to verify the workload, it’s time to see if we can optimize some of the queries. As explained...
View ArticleBig SQL Best Practice and Guidelines – Data Ingestion with LOAD
Feed: Hadoop Dev. Author: Nailah Bissoon. In some cases, data is in a certain format which needs to be converted. If the data is coming from the warehouse in text format and must be changed to a...
View ArticleAdvanced ODS Graphics: A deeper dive into documents, dynamics, and data objects
Feed: SAS Blogs. Author: Warren F. Kuhfeld. My second blog described modifying dynamic variables in ODS Graphics. Little did I know the extent to which it would launch a series of blogs, papers,...
View ArticleBig SQL Best Practices – Data Ingestion
Feed: Hadoop Dev. Author: Nailah Bissoon. There are various methods to ingest data into Big SQL. This blog gives an overview of each of these options and provide some best practices for data ingestion...
View ArticleReserved Words
Feed: Planet MySQL; Author: Peter Gulutzan; In the 1990s C.J.Date said: “The rule by which it is determined within the standard that one key word needs to be reserved while another need not be is not...
View Article#AzureSQLDW: Hub and Spoke series Integration with Azure Analysis Services
Feed: Microsoft Azure Blog. Author: Arnaud Comet. This blog post is co-authored with Ellis Hiroki Butterfield, Program Manager for SQL DW and Josh Caplan, Program Manager for Azure Analysis Services....
View ArticleBig SQL v5 Best Practices and Guidelines – Performance
Feed: Hadoop Dev. Author: Nailah Bissoon. This blog outlines some Best Practices to improve Big SQL performance in Big SQL v5. The tips are ordered according to when we believe they should be...
View ArticleExtracting knowledge from knowledge graphs using Facebook Pytorch BigGraph.
Feed: Featured Blog Posts – Data Science Central. Author: Sergey Zelvenskiy. Machine learning gives us the ability to train a model, which can convert data rows into labels in such a way that similar...
View ArticlePlanet scale operational analytics and AI with Azure Cosmos DB
Feed: Microsoft Azure Blog. Author: Rimma Nehme. We’re excited to announce new Azure Cosmos DB capabilities at Microsoft Build 2019 that enable anyone to easily build intelligent globally distributed...
View ArticlePartition Management in Hadoop
Feed: Hadoop – Cloudera Engineering Blog. Author: Shelby Khan. Guest blog post written by Adir Mashiach In this post I’ll talk about the problem of Hive tables with a lot of small partitions and files...
View ArticleIntroducing the MemSQL Kubernetes Operator
Feed: MemSQL Blog. Author: Carl Sverre. Kubernetes has taken the world by storm, transforming how applications are developed, deployed, and maintained. For a time, managing stateful services with...
View ArticleSmall Files, Big Foils: Addressing the Associated Metadata and Application...
Feed: Hadoop – Cloudera Engineering Blog. Author: Shelby Khan. Small files are a common challenge in the Apache Hadoop world and when not handled with care, they can lead to a number of complications....
View ArticleKafka Replication: The case for MirrorMaker 2.0
Feed: CDH – Cloudera Engineering Blog. Author: Renu Tewari. Apache Kafka has become an essential component of enterprise data pipelines and is used for tracking clickstream event data, collecting logs,...
View ArticleAWS Glue crawlers now support existing Data Catalog tables as sources
Feed: Recent Announcements. With this release, crawlers can now take existing tables as sources, detect changes to their schema and update the table definitions, and register new partitions as new data...
View ArticleMicrosoft 365 boosts usage analytics with Azure Cosmos DB – Part 2
Feed: Microsoft Azure Blog. Author: Parul Matah. This post is part of a 2-part series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making for...
View ArticleTest data quality at scale with Deequ
Feed: AWS Big Data Blog. You generally write unit tests for your code, but do you also test your data? Incorrect or malformed data can have a large impact on production systems. Examples of data...
View ArticleUnsupervised learning and its role in the knowledge discovery process
Feed: Featured Blog Posts – Data Science Central. Author: Ariful Islam. Unlike supervised learning, unsupervised learning not working with labeled data, it is not showing the machine the correct...
View Article