`
hxrs
  • 浏览: 25639 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

流信息处理:从数据流到复杂事件处理

阅读更多

 

流信息处理:从数据流到复杂事件处理

——读《Processing Flows of Information: From Data Stream to Complex Event Processing》笔记

 

       偶然搜到这篇文章,其对目前data stream management system 以及complex event processing 系统有一个比较全面的介绍与调研,并对比了其中各个典型产品之间的特点。

1.   Introduction

       An increasing number of distributed applications requires processing of continuously flowing data from geographically distributed sources with unpredictable rate to obtain timely responses to complex queries. After several years of research and development we can say that two models emerged and are today competing: the data stream processing model [Babcock et al. 2002] and the complex event processing model [Luckham 2001].

       DSMSs are specialized in dealing with transient data that is continuously updated. On the other side, the complex event processing model sees flowing information items as notifications of events happened in the external world, which have to be filtered and combined to deduce what is happening in terms of higher-level events.

2.   Background and motivation

         With the term Information Flow Processing (IFP) we refer to an application domain in which users need to collect information produced by multiple, distributed sources, to process it in a timely way, in order to extract new knowledge as soon as the relevant information is collected.

      As we mentioned, IFP has attracted the attention of researchers coming from different fields. The first contributions came from the database community in the form of active database systems, which were introduced to allow actions to automatically execute when given conditions arise. Data Stream Management Systems (DSMSs) pushed this idea further, to perform query processing in the presence of continuous data streams. In the same years that saw the development of DSMSs, researchers with different backgrounds identified the need of developing systems capable of processing not generic data but event notifications, coming from different sources, to identify interesting situations [Luckham 2001]. These systems are usually known as Complex Event Processing (CEP) Systems.

         Active Database Systems.  传统的DBMSHuman-Active, Database-Passive (HADP)的,而Active Database System 克服了这点限制。

         Data Stream Management Systems. 上面提及的Active Database Systems还是限制于静态的数据存储,而DSMS突破了这个限制。users do not have to explicitly ask for updated information, rather the system actively notifies them according to installed queries. 这种形式的交互也称为:Database Active, Human-Passive (DAHP).



         Complex Event Processing Systems. 上面提及的DSMS把那些需要处理的数据的语义留给客户端程序去解释。而CEP却是,they are notifications of events happened in the external world and observed by sources. The CEP engine is responsible for filtering and combining such notifications to deduce what is happening in terms of higher-level events (sometime also called composite events or situations) to be notified to sinks, which act as event consumers.



         DSMSs and CEP engines.前者主要focusflowing data and data transformations. CEP engines, either those developed as extensions of publish-subscribe middleware or those developed as totally new systems,他们focusprocessing event notifications with their ordering relationships to capture complex event patterns; and on the communication aspects involved in event processing.

         所以IFP需要考虑结合DSMS以及CEP的优点,既考虑effective data processing, 同时也including the ability to capture complex ordering relationships among data, as well as effcient event delivery, including the ability to process data in a strongly distributed fashion.

 

3.    A MODELLING FRAMEWORK FOR IFP SYSTEMS

         IFP的功能模型:



In summary, an IFP engine operates as follows: each time a new item (including those periodically produced by the Clock) enters the engine through the Receiver, a detection-production cycle is performed. Such a cycle first (detection phase) evaluates all the rules currently present in the Rules store to find those whose condition part is true. At the end of this phase we have a set of rules that have to be executed, The Producer takes this information and executes each triggered rule.

 

         处理模型:Selection policy. Consumption policy: zero consumption policy, selected consumption policy.

         Deployment Modelcentralized vs. distributedclustered vs. networked.

         clustered and networked engines focus on different aspects: the former on increasing the available processing power by sharing the workload among a set of well connected machines, the latter on minimizing bandwidth usage by processing information as close as possible to the sources.

         交互模型: push/pull.

         Data Model: tuples, records. homogeneous information flows vs. heterogeneous flows.

         Time Model: stream-only, absolute, causal, interval.

         Rule Model: transforming rules and detecting rules.

         Language Type: Transforming languages and Detecting, or pattern-based languages.

分享到:
评论

相关推荐

    复杂事件处理CEP手册

    数据流并鉴别重要事件的能力,虽然对这些事件的鉴别过程是复杂的,但结果却是无价 的。复杂事件处理能够帮助企业及时全面地洞察市场变化,降低风险和提高决策效率。下 面我们就来介绍一下复杂事件处理

    对快速移动中的数据的分析和行动:复杂事件处理

    •联合来自多个数据源的数据,生成包含更丰富和复杂信息的数据流 •计算增值信息以进行快速决策 •监测特定状况或模式以做出即时响应 •生成高级信息,如汇总数据、统计、能洞察多个独立事件的“大图”或净效应的...

    学生成绩管理系统数据库设计(1).doc

    100/天 高峰流量:500/天 数据流编号:S002 数据流名称: 课程信息 简称: 课程信息 数据流来源: 课程信息表 数据流去向:学生成绩管理系统 数据流组成: 学生姓名,学号、年龄、课程、年级 数据流量;1000/天 高峰...

    货物发送的数据流程图-业务流程图-可行性分析报告与数据字典.doc

    输入的数据流:发货单 处理:审核供应商送来的发货单的实际数量和金额以及产品名称是否填写正确,不相符 的要返回给供应商,相符的要给记账员进行登记处理。 输出的数据流:合格发货单、不合格发货单 处理频率:30...

    mysql设计的步骤-数据库设计步骤.pdf

    }, 输出: {数据流}, 处理: { 处理过程 }} 例⼦:处理过程"分配宿舍"可如下描述: 处理过程:分配宿舍 说明: 为所有新⽣分配学⽣宿舍 输⼊: 学⽣,宿舍 输出: 宿舍安排 处理: 在新⽣报到后,为所有新⽣分配学⽣...

    siddhi:流处理和复杂事件处理引擎

    是云原生的流和复杂事件处理引擎,它了解流SQL查询,以便捕获来自各种数据源的事件,对其进行处理,检测复杂条件并实时将输出发布到各个端点。 Siddhi核心库包含执行Siddhi所需的基本核心库,例如 , , 和 。 总...

    flink:可扩展的批处理和流数据处理

    基于数据流模型,在DataStream API中支持事件时间和无序处理 跨不同时间语义(事件时间,处理时间)的灵活窗口(时间,计数,会话,自定义触发器) 容错和一次处理保证 流媒体程序中的自然背压 用于图处理(批处理...

    高效易用的Easy-Spark Java流处理框架源码,集成GIS运算支持

    项目简介: 本项目为一款高效易用的Java流处理框架——Easy-Spark,致力于简化实时流...Easy-Spark不仅提供便捷的开发体验,其集成的GIS运算能力也为处理地理信息数据提供了强大支持,是大数据流处理领域的又一利器。

    住院管理信息系统数据库设计--需求分析.doc

    图1.6.1 顶层数据流图 1.6.2系统的数据字典 1.6.2.1数据流的描述 表1.6.1 病人信息录入 数据流编号:02 数据流名称:病人信息录入 简述:病人申请住院,系统管理员录入病人基本信息 数据流来源:病人本人信息 数据...

    大数据处理流程.pdf

    使⽤Flume可以收集诸如⽇志、时间等数据并将这些数据集中存储起来供下游使⽤(尤其是数据流 框架,例如Storm)。和Flume类似的另⼀个框架是Scribe(FaceBook开源的⽇志收集系统,它为⽇志的分布式收集、统⼀处理提供...

    Hadoop海量网络数据处理平台的关键技术

    5.一种基于分类器联合的分布式异常流量检测算法海量网络数据处理平台中存在着大量的实时数据流,该数据具有价值高、流量大等特点。同时,云计算平台本身具有强大的计算能力和存储资源,极易成为黑客的攻击目标。而...

    大数据实践之数据建模.pdf

    随着⼀⼤批⼤型关系数据库商业软件 (如Oracle、Informix、DB2等)的兴起,现代企业信息系统⼏乎都使⽤关系数据库来存储、加⼯和处理数据。数据仓库系统也不例外,⼤量的数据仓库系统 依托强⼤的关系数据库能⼒存储...

    基于智能物件的实时企业复杂事件处理机制1

    摘要: 实时企业能够 主动地提供及时、 准确、 详细 的信息, 这些信息除 了业务系统产生 的大量数据l e m s) 数据流 以外更主要 的是 隐藏在其背后

    论文研究-在线BCI高速数据流的可预测并发实时传输.pdf

    基于多分类运动想象的在线BCI(brain computer interface,脑机接口)中,如何实时处理高速EEG(electroencephalogram,脑电)数据流是实现在线意识识别的难点,其关键是高速计算和复杂情况下的预测问题。...

    TeleScope:XML数据流代理/复制器-开源

    TeleScope是用F编写的Fedora 17-18,Slackware 13-14,Red Hat Enterprise Linux 6(RHEL-6)Linux发行版的高效密集型XML数据流代理,复制器和简单事件处理平台(SEP)。 该平台旨在对单个数字/单词值进行操作,而不...

    大数据时代数据确权问题研究.docx

    为处理数据安全事件,相关部门只能从琐粹的法律中寻找执法依据。对数据保护的法律缺乏统一的标准,就更没有对数据所有权精确的界定。 大数据时代数据确权问题研究全文共4页,当前为第1页。 三、数据确权在数据流动...

    实时大数据分析平台EbayPulsar.zip

    Pulsar 作为一个复杂事件处理平台,具有快速,准确,灵活的特性,保证点到点的低延时和高可靠,从而很好得满足了的eBay秒级实时数据分析的需求。同时每秒百万级流量处理能力,给客户带来更好的个性化体验,帮助客户...

    从零学python系列之数据处理编程实例(二)

    在上一节从零学python系列之数据处理编程实例(一)的基础上数据发生了变化,文件中除了学生的成绩外,新增了学生姓名和出生年月的信息,因此将要成变成:分别根据姓名输出每个学生的无重复的前三个最好成绩和出生...

    汽车租赁系统数据库设计.doc

    数据流图:数据流图是用来描绘软件系统逻辑模型的图形工具,是描绘信息在系统中流 动和处理的情况的。 数据字典:数据字典是对数据流图中出现的所有数据元素、数据流、文件、处理的定义 的集合。 三、数据库的要求 ...

Global site tag (gtag.js) - Google Analytics