The conference BigData Paris was recently organized the 6th and 7th of March 2017. I though I would share my notes regarding some of the talks I attended.

Keynotes

Opening Keynote with Guillaume Poubard, ANSSI (Day 1 at 9:00AM)

  • Agence nationale de la sécurité des systèmes d’information, ANSI is a state agency to support companies around cyber security
  • Not to be mistaken by the CNIL (Commission nationale de l’informatique et des libertés)
  • Emphasis on security while dealing with data. Sign that the market is maturing
  • Key risks:
    • Economic spy activity (with data theft) - no usually detectable
    • Sabotage activities
  • Security Principles
    • Governance (It is not only an IT problem)
    • Training people around simple security measures
    • IT needs to be design to be resilient to cyber attacks and have the capacity to detect such attacks
  • Actionable plan?
    • See the ANSSI guides
    • Work with third-parties for audits
    • Training
  • Quote “Company that will succeed are companies that are secured)

Big Data, Cloud and IoT, presented by IDC:

  • Spending: Traditional IT - down / Cloud: Up / IoT, robotics: x2 Up
  • Fatest growing IoT: Manufacturing + Retail
  • By 2018: Cloud will be the preferred mechanism for analytics
  • By 2020: 45% of IT infra + app will be cloud
  • Next thing: collaboration across companies
  • 40% of the company will be a digital / 40% of projects will monetize big data
  • Uncertainty: cross-sharing of data between different companies + gap in skills ++ EU: provide free flow of data through rules and regulation
  • Quote: “Use data for competitive advantage””

Netflix: how “Stranger Things” can happen with Visual Analytics, by Jasan Flittner, Netflix (Day 1 at 12:10PM)

  • Use case of in-house Big Data analytic with Tableau
  • Work on a 60PB DW on S3
  • Toolkit:
    • Storage: s3, hadoop, redshift, teradata
    • Analytic: hive, spark, presto
    • Viz:: Tableau, sting (n internal project with Netflix in Python to access data), Microstrategy
  • A couple of techniques to use Tableau in BigData:
    • hive + tableau: odbc connection (through the thrift server) / used of materialized views / odbc optimization
    • spark + tableau: less query reliability
    • presto + tableau
    • use of Tableau Data Extract
  • Mentioned of addition offers worth to check out:

Deep Learning and Big Data applied to your Business: the key factor to success, by Luming Wang, UBER (Day 2 at 9:30AM)

  • Big data challenges: data quality, privacy, common schema / data joining problem, to scale data processing platform, domain knowledge
  • Machine learning for big data:
    • ML is a bigger hammer for getting value of Big Data
    • Better prediction (regression, random forest, XG boosted trees)
    • Better clustering (unsupervised learning)
  • Deep learning: better results than ML for specifics domains
  • Deep learning limitation:
    • Require large volume
    • Training takes times
    • Model serving performance is slower than ML
  • Leadership, vision and future
    • Be a believer
    • Make proper investment
    • Understand limitation and risks
    • Ask questions
    • Do it now: customer won’t ask for it
  • People and roles
    • Skill sets:: domain knowledge, understand ML best practice, turn solution into product
    • Roles: researcher/applied scientist , data scientist, data engineer, ml software engineer
  • Products:

Use of data in companies and customer trust: Study by Boston Consulting group, by Elias Baltassis (Day 2 at 10:00AM)

  • Companies fail to properly communicate with their customers around the data being collected
  • Perceived data misuses => leading to lower revenue
  • need to implement a cohesive data governance
  • Keys:
    • Consumer’s trust will be a competitive advantage
    • Some data are more sensitive than others
    • Distrust of companies is prevailing
    • Company actions (or lack of) are often responsible for this situation
    • Distrust and misuse of data starts having significant financial impact
    • Only frequent communication and a holistic Data Governance is the solution

Innovation Trophy (Day 2 at 12h40)

  • Tellmeplus focus on analytic for IoT (driven by the Business Object creator)
  • Log Island, a open-source project to process log data
  • Domlmen focuses on customer data acquisitions for physical stores
  • datadome protect the content of web sites from bots
  • zelros: “Turn Enterprise Data into conversational assistants”

Round-Tables

Round-Table around Security

  • Fact: Less than 10% of companies made any kind of impact analysis.
  • Hard to budget for loss inherent to security breach as oppose to budget investment to strength security
  • Interesting use case regarding the security breach with TV5 with Yves Bigot.

Chief Data Officer: new advantage in your strategy? (Day 2 at 10:50AM)

  • Characteristics:
    • Has often a team (data lab type)
    • Need to propose solution to improve performance
    • Can come from the start-up world, or from the consulting world (as already familiar with transformation project)
  • Use Case #1: Havas - No Need to have a CDO (as each function within HAVAS is different in term of data.
  • Use Case #2: Sociéte Genérale:
    • 1 CDO per business function piloted by a lead CDO
    • help business with quality and security of the data, and the development of the data culture
    • View: CDO is a temporary function while the “data culture” becomes more wide spread.
  • Note that one of the panel member came from etalab, a public body aimed at opening public data. One of the projects they worked on is a web site that predict which companies are likely to hire.

Use Cases

A couple of the sessions were dedicated to how companies were building analytic and big-data related capabilities.

Trainline (Day 1 at 3:00PM)

  • Trainline is a web site that enables you to buy tickets everywhere in Europe.
  • Presentation of the “Data Initiative” after it became clear that too much time was spent on extracting the data, and not enough time on the prediction.
  • Automation of the Data Lab using Dataiku
  • Data analysis to perform targeted marketing regarding the generation of online ads.

Under Armour (Day 1 at 4:00PM)

  • Under Armour, on top of providing sportswear, provides connected apps (around sport and well being)
  • How to create value: Data + Machine Learning + Infrastructure
  • Different cases ++ Find calories in restaurant menus (as to track your calorie intake through an app) ++ Send coach messages at the right time

Auchan: Direct - Day 1/16h20

  • Auchan: Direct deliver groceries.
  • Case for building a data lab
  • Goal: “Provide insights to collaborators as to improve customer experience”
  • Virtual circle: Learn -> Data viz -> Big Data (and repeat)

SeLoger.com (Day 1 at 4:40PM)

  • SeLoger.com facilitate the buy or sell of houses.
  • Use case with AlloMedia to analyse the content of voice calls
  • No interruption in data collection between web interactions and phone interactions

AirFrance-KLM (Day 1 at 5:00PM)

  • Objectives: Discover, Improve Knowledge, and facilitate a better Interaction with Customer

Feedback from Consulting Work

  • Process Offer: Accenture Connected Analytics Experience
    • Collaboration with multiple actors as to prototype new uses around the data
    • design-thinking, analytics
    • Actors: business, data scientist, IT
    • Few clients “blend” the three actors
  • Software Offer: Accenture Insights Platform: offer between IaaS and PaaS
  • Advice: “Do not do POC”
    • Need to create value, not assess technology
    • POC without constraint are not usually transformed
  • Quote “As for the analytics transform really the processes, the organisation needs to change”

Leasons learnt from an IoT project, by PwC (Day 2 at 11:00AM)

  • Goal: to have the pulse on the customer in continuous
  • Challenges:
    • #1: manage complexity of an IoT workflow
    • #2: Scale up (from 500 devices to 100k devices)
  • Responses:
    • Set-up of a monitoring solution (see PwC control room)
      • Detect trend
      • Set-up of a Data Lab to be used by the control room
    • Train and support of the build and support team by subject exports (such as IoT, Big Data, and Cloud)
  • Profiles to address in such projects: end-user, support, manager
  • Impacts from audit and risk management

Exhibits

There were 400+ exhibits. Below are a couple of concepts I spotted (non exhaustive):

  • DataValue: A marketing concept of Audit IMMA (Information Management Maturity Audit)
  • Data Altares: has a solution around payment recovery
  • SoftComputing: focus on Marketing data
  • Xebia: Kit to work on use cases around data analysis