Bigdata.andreamostosi.name is a subdomain of andreamostosi.name, ,
Description:This page is built merging the Hadoop Ecosystem Table (by Javi Roman and other contributors) and projects list collected on my blog. Result is an incomplete-but-useful list of big-data related...
Discover bigdata.andreamostosi.name website stats, rating, details and status online.Use our online tools to find owner and admin contact info. Find out where is server located.Read and write reviews or vote to improve it ranking. Check alliedvsaxis duplicates with related css, domain relations, most used words, social networks references. Go to regular site
HomePage size: 295.877 KB |
Page Load Time: 0.713655 Seconds |
Website IP Address: 172.67.129.134 |
CGIAR BIG DATA Platform - CGIAR Platform for Big Data in Agriculture bigdata.cgiar.org |
Buildium Marketplace | Explore Property Management Apps marketplace.buildium.com |
Data Big Bang Blog blog.databigbang.com |
Blogs by data management Experts & Analysts | ZEMA Global Data Corporation - Blogs by data managemen blog.ze.com |
TagniFi – Public company data, private company data, M&A transaction data, private equity data.. about.tagnifi.com |
Royal Corporation SDS Table of Contents | SDS Table of Contents www.sdstoc.royalcorporation.com |
Data Science and Big Data Analytics: Making Data-Driven Decisions | MIT xPRO bigdataanalytics.mit.edu |
Data Science, Analytics and Big Data discussions discuss.analyticsvidhya.com |
TABLE, CHAIR, BAR, RENTAL, $56.57* FARM TABLE RENTAL, LINEN RENTAL, ST. LOUIS, ATLANTA. tables.amerevent.com |
Data Analytics in Big Data | Teradata Solutions bigdata.teradata.com |
Open Data Inventory—Global Index of Open Data - Open Data Inventory odin.opendatawatch.com |
The Data Blog | A blog about data mining, data science, machine learning and big data, by Philippe F data-mining.philippe-fournier-viger.com |
Table Rock Campus - OTC Table Rock Campus tablerock.otc.edu |
LIL DATA SUP LIL DATA SUP LIL DATA SUP data.pcmusic.info |
Metal Table Bases & Metal Table Legs Blog | Replacementtablelegs.com blog.replacementtablelegs.com |
The Big-Data Ecosystem Table https://bigdata.andreamostosi.name/ |
Date: Tue, 14 May 2024 22:14:35 GMT |
Content-Type: text/html |
Transfer-Encoding: chunked |
Connection: keep-alive |
x-amz-id-2: Dcpt7s1M16okr6rB1Utjao8e0xKcR+n6XXlJcnNXoRbXRTTdhZiKG33xUGVGdWvGhotTzua3+Mk= |
x-amz-request-id: NDEFNGZKTCAYH5JA |
Last-Modified: Sun, 05 Jun 2016 20:32:58 GMT |
CF-Cache-Status: DYNAMIC |
Report-To: "endpoints":["url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v4?s=g30onTciA7qs2oUrrkkb5W%2BRXFDhGqPz%2B8dVg7UiIJJva%2F6VIds%2FFEAWj7fyJQ5N1hgU1joiQy9v3BJkF29eMEp73qsxGPkg1AxoJX%2Fji6C%2BSv5qlQJPH0ttTc5IZkYlWx9V5hQyxIA3bA5dV9BfvOrpEkOdidOEFQ%3D%3D"],"group":"cf-nel","max_age":604800 |
NEL: "success_fraction":0,"report_to":"cf-nel","max_age":604800 |
Server: cloudflare |
CF-RAY: 883e3df6ae06dc8f-LHR |
alt-svc: h3=":443"; ma=86400 |
charset="utf-8"/ |
content="chrome=1" http-equiv="X-UA-Compatible"/ |
content="The Big-Data Ecosystem Table" property="og:title"/ |
content="http://bigdata.andreamostosi.name/bigdata.jpg" property="og:image"/ |
content="Useful Stuff - Andrea Mostosi Blog" property="og:site_name"/ |
content="This page is built merging the Hadoop Ecosystem Table (by Javi Roman and other contributors) and projects list collected on my blog. Result is an incomplete-but-useful list of big-data related projects. If you like you can contribute to the original project or to my fork." property="og:description"/ |
content="summary" name="twitter:card"/ |
content="http://bigdata.andreamostosi.name" name="twitter:url"/ |
content="The Big-Data Ecosystem Table" name="twitter:title"/ |
content="@zenkay" name="twitter:creator"/ |
content="This page is built merging the Hadoop Ecosystem Table (by Javi Roman and other contributors) and projects list collected on my blog. Result is an incomplete-but-useful list of big-data related projects. If you like you can contribute to the original project or to my fork." name="twitter:description"/ |
content="http://bigdata.andreamostosi.name/bigdata.jpg" name="twitter:image"/ |
Incomplete-but-useful list of big-data related projects packed into a JSON dataset. Github repository: https://github.com/zenkay/bigdata-ecosystem Raw JSON data: http://bigdata.andreamostosi.name/data.json Original page on my blog: http://blog.andreamostosi.name/big-data/ by Andrea Mostosi ( http://blog.andreamostosi.name ) Frameworks Apache Hadoop framework for distributed processing. Integrates MapReduce (parallel processing), YARN (job scheduling) and HDFS (distributed file system) 1. Apache Hadoop Distributed Programming AddThis Hydra Hydra is a distributed data processing and storage system originally developed at AddThis. It ingests streams of data (think log files) and builds trees that are aggregates, summaries, or transformations of the data. These trees can be used by humans to explore (tiny queries), as part of a machine learning pipeline (big queries), or to support live consoles on websites (lots of queries). 1. Github Akela Mozilla’s utility library for Hadoop, HBase, Pig, etc. 1. Website Amazon Lambda a compute service that runs your code in response to events and automatically manages the compute resources for you 1. Website Amazon SPICE Super-fast Parallel In-memory Calculation Engine 1. Website AMPcrowd A RESTful web service that runs microtasks across multiple crowds 1. Website AMPLab G-OLA a novel mini-batch execution model that generalizes OLA to support general OLAP queries with arbitrarily nested aggregates using efficient delta maintenance techniques 1. Website AMPLab SIMR Apache Spark was developed thinking in Apache YARN. However, up to now, it has been relatively hard to run Apache Spark on Hadoop MapReduce v1 clusters, i.e. clusters that do not have YARN installed. Typically, users would have to get permission to install Spark/Scala on some subset of the machines, a process that could be time consuming. SIMR allows anyone with access to a Hadoop MapReduce v1 cluster to run Spark out of the box. A user can run Spark directly on top of Hadoop MapReduce v1 without any administrative rights, and without having Spark or Scala installed on any of the nodes. 1. SIMR on GitHub Apache Crunch is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce. The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns. For Scala users, there is the Scrunch API, which is built on top of the Java APIs and includes a REPL (read-eval-print loop) for creating MapReduce pipelines. 1. Website Apache DataFu DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages based on it to perform data analysis. It provides functions for common statistics tasks (e.g. quantiles, sampling), PageRank, stream sessionization, and set and bag operations. DataFu also provides Hadoop jobs for incremental data processing in MapReduce. DataFu is a collection of Pig UDFs (including PageRank, sessionization, set operations, sampling, and much more) that were originally developed at LinkedIn. 1. DataFu Apache Incubator 2. LinkedIn DataFu Apache Flink high-performance runtime, and automatic program optimization 1. Website Apache Gora framework for in-memory data model and persistence 1. Apache Gora Apache Hama Apache Top-Level open source project, allowing you to do advanced analytics beyond MapReduce. Many data analysis techniques such as machine learning and graph algorithms require iterative computations, this is where Bulk Synchronous Parallel model can be more effective than plain” MapReduce. 1. Hama site Apache Ignite high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time 1. Website Apache MapReduce MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Apache MapReduce was derived from Google MapReduce: Simplified Data Processing on Large Clusters paper. The current Apache MapReduce version is built over Apache YARN Framework. YARN stands for Yet-Another-Resource-Negotiator”. It is a new framework that facilitates writing arbitrary distributed processing frameworks and applications. YARN’s execution model is more generic than the earlier MapReduce implementation. YARN can run applications that do not follow the MapReduce model, unlike the original Apache Hadoop MapReduce (also called MR1). Hadoop YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing. 1. Apache MapReduce 2. Google MapReduce paper 3. Writing YARN applications Apache Pig Pig provides an engine for executing data flows in parallel on Hadoop. It includes a language, Pig Latin, for expressing these data flows. Pig Latin includes operators for many of the traditional data operations (join, sort, filter, etc.), as well as the ability for users to develop their own functions for reading, processing, and writing data. Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System, HDFS, and Hadoop’s processing system, MapReduce. 1. pig.apache.org/ 2. Pig examples by Alan Gates Apache S4 S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. 1. Apache S4 Apache Spark Data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark provides an easier to use alternative to Hadoop MapReduce and offers performance up to 10 times faster than previous generation systems like Hadoop MapReduce for certain applications. 1. Apache Incubator Spark Apache Spark Streaming framework for stream processing, part of Spark 1. Apache Spark Streaming Apache Storm Storm is a complex event processor and distributed computation framework written predominantly in the Clojure programming language. Is a distributed real-time computation system for processing fast, large streams of data. Storm is an architecture based on master-workers paradigma. So a Storm cluster mainly consists of a master and worker nodes, with coordination done by Zookeeper. 1. Storm Project/ 2. Storm-on-YARN Apache Tez Tez is a proposal to develop a generic application which can be used to process complex data-processing task DAGs and runs natively on Apache Hadoop YARN. 1. Apache Tez Apache Twill Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing developers to focus more on their business logic. Twill uses a simple thread-based model that Java programmers will find familiar. YARN can be viewed as a compute fabric of a cluster, which means YARN applications like Twill will run on any Hadoop 2 cluster. 1. Apache Twill Incubator Arvados Spins a web of microservices around unsuspecting sysadmins 1. Website Blaze Python users high-level access to efficient computation on inconveniently large data 1. Website Cascalog data processing and querying library 1. Cascalog Cheetah High Performance, Custom Data Warehouse on Top of MapReduce 1. Paper Concurrent Cascading Application framework for Java developers to simply develop robust Data Analytics and Data Management applications on Apache Hadoop. 1. Cascanding Damballa Parkour Library for develop MapReduce programs using the LISP like language Clojure. Parkour aims to provide deep Clojure integration for Hadoop. Programs using Parkour are normal Clojure programs, using standard Clojure functions instead of new framework abstractions. Programs using Parkour are also full Hadoop programs, with complete access to absolutely everything possible in raw Java Hadoop MapReduce. 1. Parkour GitHub Project Datasalt Pangool A new MapReduce paradigm. A new...
**** Registry Domain ID: 134521849_DOMAIN_NAME-VRSN Domain Name: ANDREAMOSTOSI.NAME Registrar: Gandi SAS Registrar IANA ID: 81 Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited >>> Last update of whois database: 2024-05-17T21:32:20Z <<<