The conference BigData Paris was recently organized the 6th and 7th of March 2017. I though I would share my notes regarding some of the talks I attended.
Keynotes
Opening Keynote with Guillaume Poubard, ANSSI (Day 1 at 9:00AM)
- Agence nationale de la sécurité des systèmes d’information, ANSI is a state agency to support companies around cyber security
- Not to be mistaken by the CNIL (Commission nationale de l’informatique et des libertés)
- Emphasis on security while dealing with data. Sign that the market is maturing
- Key risks:
- Economic spy activity (with data theft) - no usually detectable
- Sabotage activities
- Security Principles
- Governance (It is not only an IT problem)
- Training people around simple security measures
- IT needs to be design to be resilient to cyber attacks and have the capacity to detect such attacks
- Actionable plan?
- See the ANSSI guides
- Work with third-parties for audits
- Training
- Quote “Company that will succeed are companies that are secured)
Big Data, Cloud and IoT, presented by IDC:
- Spending: Traditional IT - down / Cloud: Up / IoT, robotics: x2 Up
- Fatest growing IoT: Manufacturing + Retail
- By 2018: Cloud will be the preferred mechanism for analytics
- By 2020: 45% of IT infra + app will be cloud
- Next thing: collaboration across companies
- 40% of the company will be a digital / 40% of projects will monetize big data
- Uncertainty: cross-sharing of data between different companies + gap in skills ++ EU: provide free flow of data through rules and regulation
- Quote: “Use data for competitive advantage””
Netflix: how “Stranger Things” can happen with Visual Analytics, by Jasan Flittner, Netflix (Day 1 at 12:10PM)
- Use case of in-house Big Data analytic with Tableau
- Work on a 60PB DW on S3
- Toolkit:
- Storage: s3, hadoop, redshift, teradata
- Analytic: hive, spark, presto
- Viz:: Tableau, sting (n internal project with Netflix in Python to access data), Microstrategy
- A couple of techniques to use Tableau in BigData:
- hive + tableau: odbc connection (through the thrift server) / used of materialized views / odbc optimization
- spark + tableau: less query reliability
- presto + tableau
- use of Tableau Data Extract
- Mentioned of addition offers worth to check out:
Deep Learning and Big Data applied to your Business: the key factor to success, by Luming Wang, UBER (Day 2 at 9:30AM)
- Big data challenges: data quality, privacy, common schema / data joining problem, to scale data processing platform, domain knowledge
- Machine learning for big data:
- ML is a bigger hammer for getting value of Big Data
- Better prediction (regression, random forest, XG boosted trees)
- Better clustering (unsupervised learning)
- Deep learning: better results than ML for specifics domains
- Deep learning limitation:
- Require large volume
- Training takes times
- Model serving performance is slower than ML
- Leadership, vision and future
- Be a believer
- Make proper investment
- Understand limitation and risks
- Ask questions
- Do it now: customer won’t ask for it
- People and roles
- Skill sets:: domain knowledge, understand ML best practice, turn solution into product
- Roles: researcher/applied scientist , data scientist, data engineer, ml software engineer
- Products:
Use of data in companies and customer trust: Study by Boston Consulting group, by Elias Baltassis (Day 2 at 10:00AM)
- Companies fail to properly communicate with their customers around the data being collected
- Perceived data misuses => leading to lower revenue
- need to implement a cohesive data governance
- Keys:
- Consumer’s trust will be a competitive advantage
- Some data are more sensitive than others
- Distrust of companies is prevailing
- Company actions (or lack of) are often responsible for this situation
- Distrust and misuse of data starts having significant financial impact
- Only frequent communication and a holistic Data Governance is the solution
Innovation Trophy (Day 2 at 12h40)
- Tellmeplus focus on analytic for IoT (driven by the Business Object creator)
- Log Island, a open-source project to process log data
- Domlmen focuses on customer data acquisitions for physical stores
- datadome protect the content of web sites from bots
- zelros: “Turn Enterprise Data into conversational assistants”
Round-Tables
Round-Table around Security
- Fact: Less than 10% of companies made any kind of impact analysis.
- Hard to budget for loss inherent to security breach as oppose to budget investment to strength security
- Interesting use case regarding the security breach with TV5 with Yves Bigot.
Chief Data Officer: new advantage in your strategy? (Day 2 at 10:50AM)
- Characteristics:
- Has often a team (data lab type)
- Need to propose solution to improve performance
- Can come from the start-up world, or from the consulting world (as already familiar with transformation project)
- Use Case #1: Havas - No Need to have a CDO (as each function within HAVAS is different in term of data.
- Use Case #2: Sociéte Genérale:
- 1 CDO per business function piloted by a lead CDO
- help business with quality and security of the data, and the development of the data culture
- View: CDO is a temporary function while the “data culture” becomes more wide spread.
- Note that one of the panel member came from etalab, a public body aimed at opening public data. One of the projects they worked on is a web site that predict which companies are likely to hire.
Use Cases
A couple of the sessions were dedicated to how companies were building analytic and big-data related capabilities.
Trainline (Day 1 at 3:00PM)
- Trainline is a web site that enables you to buy tickets everywhere in Europe.
- Presentation of the “Data Initiative” after it became clear that too much time was spent on extracting the data, and not enough time on the prediction.
- Automation of the Data Lab using Dataiku
- Data analysis to perform targeted marketing regarding the generation of online ads.
Under Armour (Day 1 at 4:00PM)
- Under Armour, on top of providing sportswear, provides connected apps (around sport and well being)
- How to create value: Data + Machine Learning + Infrastructure
- Different cases ++ Find calories in restaurant menus (as to track your calorie intake through an app) ++ Send coach messages at the right time
Auchan: Direct - Day 1/16h20
- Auchan: Direct deliver groceries.
- Case for building a data lab
- Goal: “Provide insights to collaborators as to improve customer experience”
- Virtual circle: Learn -> Data viz -> Big Data (and repeat)
SeLoger.com (Day 1 at 4:40PM)
- SeLoger.com facilitate the buy or sell of houses.
- Use case with AlloMedia to analyse the content of voice calls
- No interruption in data collection between web interactions and phone interactions
AirFrance-KLM (Day 1 at 5:00PM)
- Objectives: Discover, Improve Knowledge, and facilitate a better Interaction with Customer
Feedback from Consulting Work
Trends and leassons learnt from Acenture (Day 1 at 11h00)
- Process Offer: Accenture Connected Analytics Experience
- Collaboration with multiple actors as to prototype new uses around the data
- design-thinking, analytics
- Actors: business, data scientist, IT
- Few clients “blend” the three actors
- Software Offer: Accenture Insights Platform: offer between IaaS and PaaS
- Advice: “Do not do POC”
- Need to create value, not assess technology
- POC without constraint are not usually transformed
- Quote “As for the analytics transform really the processes, the organisation needs to change”
Leasons learnt from an IoT project, by PwC (Day 2 at 11:00AM)
- Goal: to have the pulse on the customer in continuous
- Challenges:
- #1: manage complexity of an IoT workflow
- #2: Scale up (from 500 devices to 100k devices)
- Responses:
- Set-up of a monitoring solution (see PwC control room)
- Detect trend
- Set-up of a Data Lab to be used by the control room
- Train and support of the build and support team by subject exports (such as IoT, Big Data, and Cloud)
- Set-up of a monitoring solution (see PwC control room)
- Profiles to address in such projects: end-user, support, manager
- Impacts from audit and risk management
Exhibits
There were 400+ exhibits. Below are a couple of concepts I spotted (non exhaustive):
- DataValue: A marketing concept of Audit IMMA (Information Management Maturity Audit)
- Data Altares: has a solution around payment recovery
- SoftComputing: focus on Marketing data
- Xebia: Kit to work on use cases around data analysis